Re: Numeric value ignored by EdgeNGramFilterFactory
Hi, You can use the "Analysis" page in the Solr Admin UI to input your value and test the output, and see how the tokenizersand various filters does to your value. Regards, Edwin On Thu, 4 Jul 2019 at 17:28, Yasufumi Mizoguchi wrote: > Hi, > > EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param. > Check the example of minGramSize="4" maxGramSize="6" case in below page. > > https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter > > So, you should set minGramSize=2 or 1 if you want to keep 72 or the other > short tokens, I think. > > Thanks, > Yasufumi > > 2019年7月4日(木) 17:20 Shamik Bandopadhyay : > > > Hi, > > > >I'm using EdgeNGramFilterFactory to support partial search. Here's my > > field definition. > > > > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > > > > generateWordParts="1" > > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="0"/> > > > words="stopwords.txt" /> > > > > protected="protwords.txt"/> > > > > > maxGramSize="30"/> > > > > > > > > > > generateWordParts="1" > > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="0"/> > > > words="stopwords.txt" /> > > > synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/> > > > > protected="protwords.txt"/> > > > > > > > > > > > > I run into an issue when I'm trying a numeric terms in search. For e.g. > if > > I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only > stores > > hou and hour in index. Since I'm using AND operator, the query fails to > > match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, > but > > I thought that would be an un-necessary overhead. Is there a reason why > 72 > > is ignored and what'll be the best way to address this scenario? > > > > Any pointers will be appreciated. > > > > Thanks, > > Shamik > > >
Re: Numeric value ignored by EdgeNGramFilterFactory
The admin/analysis page is very valuable for this kind of question. Your edgengram filter has a mingram size of 3 so it’s throwing out 72. Best, Erick > On Jul 4, 2019, at 1:27 AM, Shamik Bandopadhyay wrote: > > autoGeneratePhraseQueries="true"> "solr.WhitespaceTokenizerFactory" /> "solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll= > "0" splitOnCaseChange="0" /> ignoreCase="true" words="stopwords.txt" /> "solr.LowerCaseFilterFactory" /> "solr.KeywordMarkerFilterFactory" protected="protwords.txt" /> ="solr.PorterStemFilterFactory" /> "solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30" /> class="solr.RemoveDuplicatesTokenFilterFactory" /> type="query"> class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll= > "0" splitOnCaseChange="0" /> ignoreCase="true" words="stopwords.txt" /> "solr.SynonymGraphFilterFactory" synonyms="synonyms/synonyms.txt" ignoreCase > ="true" expand="true" /> < > filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /> > "solr.RemoveDuplicatesTokenFilterFactory" />
Re: Numeric value ignored by EdgeNGramFilterFactory
Hi, EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param. Check the example of minGramSize="4" maxGramSize="6" case in below page. https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter So, you should set minGramSize=2 or 1 if you want to keep 72 or the other short tokens, I think. Thanks, Yasufumi 2019年7月4日(木) 17:20 Shamik Bandopadhyay : > Hi, > >I'm using EdgeNGramFilterFactory to support partial search. Here's my > field definition. > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > words="stopwords.txt" /> > > > > maxGramSize="30"/> > > > > > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > words="stopwords.txt" /> > synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/> > > > > > > > > I run into an issue when I'm trying a numeric terms in search. For e.g. if > I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores > hou and hour in index. Since I'm using AND operator, the query fails to > match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but > I thought that would be an un-necessary overhead. Is there a reason why 72 > is ignored and what'll be the best way to address this scenario? > > Any pointers will be appreciated. > > Thanks, > Shamik >
Fwd: Numeric value ignored by EdgeNGramFilterFactory
Hi, I'm using EdgeNGramFilterFactory to support partial search. Here's my field definition. I run into an issue when I'm trying a numeric terms in search. For e.g. if I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores hou and hour in index. Since I'm using AND operator, the query fails to match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but I thought that would be an un-necessary overhead. Is there a reason why 72 is ignored and what'll be the best way to address this scenario? Any pointers will be appreciated. Thanks, Shamik
Numeric value ignored by EdgeNGramFilterFactory
Hi, I'm using EdgeNGramFilterFactory to support partial search. Here's my field definition. < filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" /> I run into an issue when I'm trying a numeric terms in search. For e.g. if I search for *72 hours*, EdgeNGramFilterFactory ignores 72 and only stores *hou* and *hour* in index. Since I'm using AND operator, the query fails to match *72 *hours. I can enable EdgeNGramFilterFactory in the query chain, but I thought that would be an un-necessary overhead. Is there a reason why 72 is ignored and what'll be the best way to address this scenario? Any pointers will be appreciated. Thanks, Shamik
Numeric value ignored by EdgeNGramFilterFactory
Hi, I'm using EdgeNGramFilterFactory to support partial search. Here's my field definition. I run into an issue when I'm trying a numeric terms in search. For e.g. if I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores hou and hour in index. Since I'm using AND operator, the query fails to match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but I thought that would be an un-necessary overhead. Is there a reason why 72 is ignored and what'll be the best way to address this scenario? Any pointers will be appreciated. Thanks, Shamik