[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510056#comment-16510056 ] Ingomar Wesp commented on SOLR-5152: Given that LUCENE-7960 has been closed, I think this issue can be marked as fixed as well. > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg >Priority: Major > Attachments: SOLR-5152-v5.0.0.patch, SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424380#comment-16424380 ] Shawn Heisey commented on SOLR-5152: Linking as a duplicate of LUCENE-7960. That issue is in the correct project. It solves the problem in a slightly different way that has more functionality. > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg >Priority: Major > Attachments: SOLR-5152-v5.0.0.patch, SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424379#comment-16424379 ] Shawn Heisey commented on SOLR-5152: See LUCENE-7960. Similar idea, but treats short and long tokens separately. > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg >Priority: Major > Attachments: SOLR-5152-v5.0.0.patch, SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423815#comment-16423815 ] Thomas Wöckinger commented on SOLR-5152: So what can be done to get this into the main line? > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg >Priority: Major > Attachments: SOLR-5152-v5.0.0.patch, SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909451#comment-14909451 ] Furkan KAMACI commented on SOLR-5152: - Thanks [~urusha] :) [~thetaphi] I think that this patch should be merged to source code. > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg > Attachments: SOLR-5152-v5.0.0.patch, SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818583#comment-13818583 ] Furkan KAMACI commented on SOLR-5152: - I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a patch. > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg > Attachments: SOLR-5152.patch > > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token
[ https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742051#comment-13742051 ] Christoph Lingg commented on SOLR-5152: --- how about a properties as in _WhitespaceTokenizerFactory_: preserveOriginal="1" > EdgeNGramFilterFactory deletes token > > > Key: SOLR-5152 > URL: https://issues.apache.org/jira/browse/SOLR-5152 > Project: Solr > Issue Type: Improvement >Affects Versions: 4.4 >Reporter: Christoph Lingg > > I am using EdgeNGramFilterFactory in my schema.xml > {code:xml} positionIncrementGap="100"> > > > maxGramSize="10" side="front" /> > > {code} > Some tokens in my index only consist of one character, let's say {{R}}. > minGramSize is set to 2 and is bigger than the length of the token. I > expected the NGramFilter to left {{R}} unchanged but in fact it is deleting > the token. > For my use case this interpretation is undesirable, and probably for most use > cases too!? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org