Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

It doesn't tell much:

"debug":{ "rawquerystring":"email:*@aol.com", "querystring":"email:*@aol.com", 
"parsedquery":"(email:*@aol.com)", "parsedquery_toString":"email:*@aol.com", 
"explain":{ "11d6e092-58b5-4c1b-83bc-f3b37e0797fd":{ "match":true, "value":1.0, 
"description":"email:*@aol.com"},


The email field uses ReversedWildcardFilter for both indexing and query.

On 4/15/20 12:04 PM, Erick Erickson wrote:

What do you see if you add =query? That should tell you….

Best,
Erick


On Apr 15, 2020, at 2:40 PM, TK Solr  wrote:

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working OK (even 
with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.

On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK






Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread Erick Erickson
What do you see if you add =query? That should tell you….

Best,
Erick

> On Apr 15, 2020, at 2:40 PM, TK Solr  wrote:
> 
> Thank you.
> 
> Is there any harm if I use it on the query side too? In my case it seems 
> working OK (even with withOriginal="false"), and even faster.
> I see the query parser code is taking a look at index analyzer and applying 
> ReversedWildcardFilter at query time. But I didn't
> quite understand what happens if the query analyzer also uses 
> ReversedWildcardFilter.
> 
> On 4/15/20 1:51 AM, Colvin Cowie wrote:
>> You only need apply it in the index analyzer:
>> https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
>> If it appears in the index analyzer, the query part of it is automatically
>> applied at query time.
>> 
>> The ReversedWildcardFilter indexes *every* token in reverse, with a special
>> character at the start ('\u0001' I believe) to avoid false positive matches
>> when the query term isn't reversed (e.g. if the term being indexed is mar,
>> then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
>> accidentally match that). If *withOriginal* is set to true then it will
>> reverse the normal token as well as the reversed token.
>> 
>> 
>> On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:
>> 
>>> I experimented with the index-time only use of ReversedWildcardFilter and
>>> the
>>> both time use.
>>> 
>>> My result shows using ReverseWildcardFilter both times runs twice as fast
>>> but my
>>> dataset is not very large (in the order of 10k docs), so I'm not sure if I
>>> can
>>> make a conclusion.
>>> 
>>> On 4/8/20 2:49 PM, TK Solr wrote:
 In the usage example shown in ReversedWildcardFilter
 <
>>> https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>
>>> 
 in Solr Ref Guide,
 and only usage find in managed-schema to define text_general_rev, the
>>> filter
 is used only for indexing.
 
 >>> positionIncrementGap="100">
 
 
 >>> ignoreCase="true"/>
 
 >> maxPosQuestion="2"
 maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>
 
 
 
 >>> ignoreCase="true" synonyms="synonyms.txt"/>
 >>> ignoreCase="true"/>
 
 
 
 
 
 Is it incorrect to use the same analyzer for query like?
 
 >>> positionIncrementGap="100">
 
 
 
 
 >> maxPosQuestion="0"
 maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>
 
 
 
 In the description of filter, I see "Tokens without wildcards are not
>>> reversed."
 But the wildcard appears only in the query string. How can
 ReversedWildcardFilter know if the wildcard is being used
 if the filter is used only at the indexing time?
 
 TK
 
 



Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working 
OK (even with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.


On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK




Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread Colvin Cowie
You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:

> I experimented with the index-time only use of ReversedWildcardFilter and
> the
> both time use.
>
> My result shows using ReverseWildcardFilter both times runs twice as fast
> but my
> dataset is not very large (in the order of 10k docs), so I'm not sure if I
> can
> make a conclusion.
>
> On 4/8/20 2:49 PM, TK Solr wrote:
> > In the usage example shown in ReversedWildcardFilter
> > <
> https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>
>
> > in Solr Ref Guide,
> > and only usage find in managed-schema to define text_general_rev, the
> filter
> > is used only for indexing.
> >
> >> positionIncrementGap="100">
> > 
> >   
> >> ignoreCase="true"/>
> >   
> >maxPosQuestion="2"
> > maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>
> > 
> > 
> >   
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >> ignoreCase="true"/>
> >   
> > 
> >   
> >
> >
> > Is it incorrect to use the same analyzer for query like?
> >
> >> positionIncrementGap="100">
> > 
> > 
> >   
> >   
> >maxPosQuestion="0"
> > maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>
> > 
> >   
> >
> > In the description of filter, I see "Tokens without wildcards are not
> reversed."
> > But the wildcard appears only in the query string. How can
> > ReversedWildcardFilter know if the wildcard is being used
> > if the filter is used only at the indexing time?
> >
> > TK
> >
> >
>


Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-08 Thread TK Solr
I experimented with the index-time only use of ReversedWildcardFilter and the 
both time use.


My result shows using ReverseWildcardFilter both times runs twice as fast but my 
dataset is not very large (in the order of 10k docs), so I'm not sure if I can 
make a conclusion.


On 4/8/20 2:49 PM, TK Solr wrote:
In the usage example shown in ReversedWildcardFilter 
 
in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the filter 
is used only for indexing.


  positionIncrementGap="100">

    
  
  ignoreCase="true"/>

  
  maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>

    
    
  
  ignoreCase="true" synonyms="synonyms.txt"/>
  ignoreCase="true"/>

  
    
  


Is it incorrect to use the same analyzer for query like?

  positionIncrementGap="100">

    
    
  
  
  maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>

    
  

In the description of filter, I see "Tokens without wildcards are not reversed."
But the wildcard appears only in the query string. How can 
ReversedWildcardFilter know if the wildcard is being used

if the filter is used only at the indexing time?

TK