[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974528#comment-16974528
 ] 

ASF subversion and git services commented on LUCENE-9018:
-

Commit e5f2b2380b6e93d48df5f1733113c6b6c0bc090c in lucene-solr's branch 
refs/heads/branch_8x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5f2b23 ]

LUCENE-9018: ConcatenateGraphFilter now has a configurable separator.

(cherry picked from commit e466d622c8161038d4e0730e2925474a0a05d596)


> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-13 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973872#comment-16973872
 ] 

Lucene/Solr QA commented on LUCENE-9018:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
39s{color} | {color:green} common in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9018 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985615/LUCENE-9018.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 068b6ba |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/227/testReport/ |
| modules | C: lucene/analysis/common U: lucene/analysis/common |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/227/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-12 Thread Stanislav Mikulchik (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972335#comment-16972335
 ] 

Stanislav Mikulchik commented on LUCENE-9018:
-

> There is a backwards-compatibility concern we overlooked
Added getLuceneMatchVersion() check to ConcatenateGraphFilterFactory
> getCharacter on an empty string can mean null
Done
> RE CompletionAnalyzer
Reverted changes

[^LUCENE-9018.patch]

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-10 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971132#comment-16971132
 ] 

David Smiley commented on LUCENE-9018:
--

Thanks.

There is a backwards-compatibility concern we overlooked.  You can call 
getLuceneMatchVersion() in the factory to switch between old behavior and new.  
The old should retain "preserveSep" with non-configurable separator.  The new 
behavior I think should have a similar default such that if you don't specify 
anything about the separator, you get the same separator char we get today.  We 
don't want/need to change the semantics.   getCharacter on an empty string can 
mean null.

RE CompletionAnalyzer, I'm skeptical you should change it's public API since I 
suspect this level of configurability is not useful for that feature?

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-06 Thread Stanislav Mikulchik (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968549#comment-16968549
 ] 

Stanislav Mikulchik commented on LUCENE-9018:
-

Unfortunately there is no getCharacter method that can tolerate optional 
parameter so I created my own.

Renamed separator to tokenSeparator.

Replaced preserveSep with Character tokenSeparator.

In the first try I didn't notice the usage of ConcatenateGraphFilter 
constructor in the other classes, so I fixed that too.

[^LUCENE-9018.patch]

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-10-29 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961985#comment-16961985
 ] 

David Smiley commented on LUCENE-9018:
--

Thanks for contributing!
* Factory: see method {{getChar}} instead of simply {{get}}
* I think we should use the same factory parameter name for this that 
ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator".  
Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" 
but that filter is more niche so I prefer to standardize on the choice made by 
the more common filter.
* I think the semantics of both "preserveSep" (a boolean) and "separator" (the 
char) as you have defined it, is confusing. You've made the separator 
preservation an OR between those two.  I think it's clearer to keep preserveSep 
as the toggle that decides if we need to preserve a separator at all, and use 
"separator" to be the setting that determines _what_ the separator char should 
be (only honored when preserveSep==true).  The latter should simply default to 
SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly 
simpler conditional, and moreover something I find easier to understand.  The 
documentation on preserveSep would need a slight adjustment to point to 
separator setting since the separator won't always be SEP_LABEL anymore.
* I think ConcatenateGraphFilter could have just one Character separator that 
may be null.  This would replace preserveSep.

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org