Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Mike Phillips
It looks like the debug result you are showing me is the results for 
Rod's not Rod’s, but in answer to your question


This is why I think    "Rod’s  finds fields Rod's and 
Rod’s that are now in the index as rod's"


The analysis page shows Rod’s gets stored in the index as:
rod's rods rod s

Field Value (Index)

Rod’s

Analyse Fieldname / FieldType: _text_ Schema Browser 



 *
   Verbose Output

WT

text
raw_bytes
start
end
positionLength
type
termFrequency
position


Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

SF

text
raw_bytes
start
end
positionLength
type
termFrequency
position


Rod’s
[52 6f 64 e2 80 99 73]
0
5
1
word
1
1

WDGF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

FGF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod’s
[52 6f 64 e2 80 99 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

PRF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


Rod's
[52 6f 64 27 73]
0
5
2
word
1
1
false


Rods
[52 6f 64 73]
0
5
2
word
1
1
false


Rod
[52 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false

LCF

tex

t
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword


rod's
[72 6f 64 27 73]
0
5
2
word
1
1
false


rods
[72 6f 64 73]
0
5
2
word
1
1
false


rod
[72 6f 64]
0
3
1
word
1
1
false


s
[73]
4
5
1
word
1
2
false



This is  what we were trying to achieve with the class="solr.PatternReplaceFilterFactory" pattern="’" replacement="'"/>



The problem is when using wildcard *Rod’s* we get no hits
||

|"responseHeader":{ "status":0, "QTime":2, "params":{ "q":"*Rod’s*", 
"debugQuery":"on", "_":"1582315262594"}}, 
"response":{"numFound":0,"start":0,"docs":[] }, "debug":{ 
"rawquerystring":"*Rod’s*", "querystring":"*Rod’s*", 
"parsedquery":"_text_:*rod’s*", "parsedquery_toString":"_text_:*rod’s*", 
"explain":{}, "QParser":"LuceneQParser", ... |







On 2/21/2020 11:52 AM, Erick Erickson wrote:

Why do you say “…that are now in the index as rod’s”? You have 
WordDelimiterGraphFilterFactory, which breaks things up. When I put your field 
definition in the schema and use the analysis page, turns “rod’s” into  the 
following 4 tokens:

rod’s
rods
rod
s

And querying on field:”*Rod’s*” works just fine. I’m using 8.x, and when I add 
“&debug=query” to the URL, I see:
{
"responseHeader": {
"status": 0, "QTime": 10, "params": {
"q": "eoe:\"*Rod's*\"", "debug": "query"
}
}, "response": {
"numFound": 1, "start": 0, "docs": [
{
"id": "1", "eoe": "Rod's", "_version_": 1659176849231577088
}
]
}, "debug": {
"rawquerystring": "eoe:\"*Rod's*\"", "querystring": "eoe:\"*Rod's*\"", "parsedquery": "SynonymQuery(Synonym(eoe:*rod's* 
eoe:rod))", "parsedquery_toString": "Synonym(eoe:*rod's* eoe:rod)", "QParser": "LuceneQParser"
}
}

What do you see?

Best,
Erick


On Feb 21, 2020, at 12:57 PM, Mike Phillips  
wrote:

Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's

but *Rod’s* finds nothing because the index now only contains rod's





Re: Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Erick Erickson
Why do you say “…that are now in the index as rod’s”? You have 
WordDelimiterGraphFilterFactory, which breaks things up. When I put your field 
definition in the schema and use the analysis page, turns “rod’s” into  the 
following 4 tokens:

rod’s
rods
rod
s

And querying on field:”*Rod’s*” works just fine. I’m using 8.x, and when I add 
“&debug=query” to the URL, I see: 
{
"responseHeader": {
"status": 0, "QTime": 10, "params": {
"q": "eoe:\"*Rod's*\"", "debug": "query"
}
}, "response": {
"numFound": 1, "start": 0, "docs": [
{
"id": "1", "eoe": "Rod's", "_version_": 1659176849231577088
}
]
}, "debug": {
"rawquerystring": "eoe:\"*Rod's*\"", "querystring": "eoe:\"*Rod's*\"", 
"parsedquery": "SynonymQuery(Synonym(eoe:*rod's* eoe:rod))", 
"parsedquery_toString": "Synonym(eoe:*rod's* eoe:rod)", "QParser": 
"LuceneQParser"
}
}

What do you see?

Best,
Erick

> On Feb 21, 2020, at 12:57 PM, Mike Phillips  
> wrote:
> 
> Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's
> 
> but *Rod’s* finds nothing because the index now only contains rod's



Is this a bug? Wildcard with PatternReplaceFilterFactory

2020-02-21 Thread Mike Phillips

Is this a bug? Wildcard with PatternReplaceFilterFactory

Attempting to normalize left and right single and double quotes for searches

‘   Left single quotation mark    '    Single quote
’   Right single quotation mark   '    Single quote
“   Left double quotation mark    "    Double quotes
”   Right double quotation mark   "    Double quotes


    positionIncrementGap="100" multiValued="true">

  
    
    words="stopwords.txt" />
        preserveOriginal="1" catenateWords="1"/>
         
        replacement="'"/>
        replacement="'"/>
    replacement="""/>
    replacement="""/>

    
  
  
    
        preserveOriginal="1" catenateWords="1"/>
    words="stopwords.txt" />
    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        replacement="'"/>
        replacement="'"/>
    replacement="""/>
    replacement="""/>

    
  
    

The wildcard seems to NOT utilize the PatternReplaceFilterFactory

Rod’s  finds fields Rod's and Rod’s that are now in the index as rod's

but *Rod’s* finds nothing because the index now only contains rod's