[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974528#comment-16974528 ] ASF subversion and git services commented on LUCENE-9018: - Commit e5f2b2380b6e93d48df5f1733113c6b6c0bc090c in lucene-solr's branch refs/heads/branch_8x from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5f2b23 ] LUCENE-9018: ConcatenateGraphFilter now has a configurable separator. (cherry picked from commit e466d622c8161038d4e0730e2925474a0a05d596) > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973872#comment-16973872 ] Lucene/Solr QA commented on LUCENE-9018: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 39s{color} | {color:green} common in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-9018 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985615/LUCENE-9018.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 068b6ba | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/227/testReport/ | | modules | C: lucene/analysis/common U: lucene/analysis/common | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/227/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972335#comment-16972335 ] Stanislav Mikulchik commented on LUCENE-9018: - > There is a backwards-compatibility concern we overlooked Added getLuceneMatchVersion() check to ConcatenateGraphFilterFactory > getCharacter on an empty string can mean null Done > RE CompletionAnalyzer Reverted changes [^LUCENE-9018.patch] > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971132#comment-16971132 ] David Smiley commented on LUCENE-9018: -- Thanks. There is a backwards-compatibility concern we overlooked. You can call getLuceneMatchVersion() in the factory to switch between old behavior and new. The old should retain "preserveSep" with non-configurable separator. The new behavior I think should have a similar default such that if you don't specify anything about the separator, you get the same separator char we get today. We don't want/need to change the semantics. getCharacter on an empty string can mean null. RE CompletionAnalyzer, I'm skeptical you should change it's public API since I suspect this level of configurability is not useful for that feature? > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch, LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968549#comment-16968549 ] Stanislav Mikulchik commented on LUCENE-9018: - Unfortunately there is no getCharacter method that can tolerate optional parameter so I created my own. Renamed separator to tokenSeparator. Replaced preserveSep with Character tokenSeparator. In the first try I didn't notice the usage of ConcatenateGraphFilter constructor in the other classes, so I fixed that too. [^LUCENE-9018.patch] > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch, LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961985#comment-16961985 ] David Smiley commented on LUCENE-9018: -- Thanks for contributing! * Factory: see method {{getChar}} instead of simply {{get}} * I think we should use the same factory parameter name for this that ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator". Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" but that filter is more niche so I prefer to standardize on the choice made by the more common filter. * I think the semantics of both "preserveSep" (a boolean) and "separator" (the char) as you have defined it, is confusing. You've made the separator preservation an OR between those two. I think it's clearer to keep preserveSep as the toggle that decides if we need to preserve a separator at all, and use "separator" to be the setting that determines _what_ the separator char should be (only honored when preserveSep==true). The latter should simply default to SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly simpler conditional, and moreover something I find easier to understand. The documentation on preserveSep would need a slight adjustment to point to separator setting since the separator won't always be SEP_LABEL anymore. * I think ConcatenateGraphFilter could have just one Character separator that may be null. This would replace preserveSep. > Separator for ConcatenateGraphFilterFactory > --- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Stanislav Mikulchik >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org