Re: Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Zheng Lin Edwin Yeo
Hi,

You can use the "Analysis" page in the Solr Admin UI to input your value
and test the output, and see how the tokenizersand various filters does to
your value.

Regards,
Edwin

On Thu, 4 Jul 2019 at 17:28, Yasufumi Mizoguchi 
wrote:

> Hi,
>
> EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param.
> Check the example of minGramSize="4" maxGramSize="6" case in below page.
>
> https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter
>
> So, you should set minGramSize=2 or 1 if you want to keep 72 or the other
> short tokens, I think.
>
> Thanks,
> Yasufumi
>
> 2019年7月4日(木) 17:20 Shamik Bandopadhyay :
>
> > Hi,
> >
> >I'm using EdgeNGramFilterFactory to support partial search. Here's my
> > field definition.
> >
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > 
> > 
> >  generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> >  > words="stopwords.txt" />
> > 
> >  protected="protwords.txt"/>
> > 
> >  > maxGramSize="30"/>
> > 
> > 
> > 
> > 
> >  generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> >  > words="stopwords.txt" />
> >  > synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
> > 
> >  protected="protwords.txt"/>
> > 
> > 
> > 
> > 
> >
> > I run into an issue when I'm trying a numeric terms in search. For e.g.
> if
> > I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only
> stores
> > hou and hour in index. Since I'm using AND operator, the query fails to
> > match 72 hours. I can enable EdgeNGramFilterFactory in the query chain,
> but
> > I thought that would be an un-necessary overhead. Is there a reason why
> 72
> > is ignored and what'll be the best way to address this scenario?
> >
> > Any pointers will be appreciated.
> >
> > Thanks,
> > Shamik
> >
>


Re: Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Erick Erickson
The admin/analysis page is very valuable for this kind of question. 

Your edgengram filter has a mingram size of 3 so it’s throwing out 72.

Best,
Erick

> On Jul 4, 2019, at 1:27 AM, Shamik Bandopadhyay  wrote:
> 
>  autoGeneratePhraseQueries="true">   "solr.WhitespaceTokenizerFactory" />  "solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll=
> "0" splitOnCaseChange="0" />  ignoreCase="true" words="stopwords.txt" />  "solr.LowerCaseFilterFactory" />  "solr.KeywordMarkerFilterFactory" protected="protwords.txt" />  ="solr.PorterStemFilterFactory" />  "solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30" />  class="solr.RemoveDuplicatesTokenFilterFactory" />   type="query">   class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1" catenateAll=
> "0" splitOnCaseChange="0" />  ignoreCase="true" words="stopwords.txt" />  "solr.SynonymGraphFilterFactory" synonyms="synonyms/synonyms.txt" ignoreCase
> ="true" expand="true" />  <
> filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
>   "solr.RemoveDuplicatesTokenFilterFactory" />  



Re: Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Yasufumi Mizoguchi
Hi,

EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param.
Check the example of minGramSize="4" maxGramSize="6" case in below page.
https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter

So, you should set minGramSize=2 or 1 if you want to keep 72 or the other
short tokens, I think.

Thanks,
Yasufumi

2019年7月4日(木) 17:20 Shamik Bandopadhyay :

> Hi,
>
>I'm using EdgeNGramFilterFactory to support partial search. Here's my
> field definition.
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
> 
> 
>  generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
>  words="stopwords.txt" />
> 
> 
> 
>  maxGramSize="30"/>
> 
> 
> 
> 
>  generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
>  words="stopwords.txt" />
>  synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>
> I run into an issue when I'm trying a numeric terms in search. For e.g. if
> I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
> hou and hour in index. Since I'm using AND operator, the query fails to
> match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
> I thought that would be an un-necessary overhead. Is there a reason why 72
> is ignored and what'll be the best way to address this scenario?
>
> Any pointers will be appreciated.
>
> Thanks,
> Shamik
>


Fwd: Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Shamik Bandopadhyay
Hi,

   I'm using EdgeNGramFilterFactory to support partial search. Here's my
field definition.
























I run into an issue when I'm trying a numeric terms in search. For e.g. if
I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
hou and hour in index. Since I'm using AND operator, the query fails to
match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
I thought that would be an un-necessary overhead. Is there a reason why 72
is ignored and what'll be the best way to address this scenario?

Any pointers will be appreciated.

Thanks,
Shamik


Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Shamik Bandopadhyay
Hi,

   I'm using EdgeNGramFilterFactory to support partial search. Here's my
field definition.

 <
filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
   

I run into an issue when I'm trying a numeric terms in search. For e.g. if
I search for *72 hours*, EdgeNGramFilterFactory ignores 72 and only stores
*hou* and *hour* in index. Since I'm using AND operator, the query fails to
match *72 *hours. I can enable EdgeNGramFilterFactory in the query chain,
but I thought that would be an un-necessary overhead. Is there a reason why
72 is ignored and what'll be the best way to address this scenario?

Any pointers will be appreciated.

Thanks,
Shamik


Numeric value ignored by EdgeNGramFilterFactory

2019-07-04 Thread Shamik Bandopadhyay
Hi,

   I'm using EdgeNGramFilterFactory to support partial search. Here's my
field definition.
























I run into an issue when I'm trying a numeric terms in search. For e.g. if
I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
hou and hour in index. Since I'm using AND operator, the query fails to
match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
I thought that would be an un-necessary overhead. Is there a reason why 72
is ignored and what'll be the best way to address this scenario?

Any pointers will be appreciated.

Thanks,
Shamik