Thanks Shawn. Now I understand why my query without slop worked on the old
system, not new one.
On top of this NGram phrase query, I also want to do some wildcards. I used
complex phrase in my query.
{!complexphrase inOrder=true}sequence:"KK?” worked
debug":{
"rawquerystring":"{!complexphrase inOrder=true}sequence:\"KK?\"",
"querystring":"{!complexphrase inOrder=true}sequence:\"KK?\"",
"parsedquery":"ComplexPhraseQuery(\"KK?\")",
"parsedquery_toString":"\"KK?\"",
"explain":{
"ab95710d-f191-4c59-9df8-87e8bfe236ea":"\n0.48393333 =
weight(sequence:kks in 0) [SchemaSimilarity], result of:\n 0.48393333 =
score(doc=0,freq=1.0 = termFreq=1.0\n), product of:\n 0.2876821 = idf,
computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n
1.0 = docFreq\n 1.0 = docCount\n 1.682181 = tfNorm, computed as (freq *
(k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = parameter b\n
115.0 = avgFieldLength\n 1.0 = fieldLength\n"},
"QParser":"ComplexPhraseQParser”,
However, {!complexphrase inOrder=true}sequence:"KK? KSA”~1 does not work.
"debug":{
"rawquerystring":"{!complexphrase inOrder=true}sequence:\"KK? KSA\"~1",
"querystring":"{!complexphrase inOrder=true}sequence:\"KK? KSA\"~1",
"parsedquery":"ComplexPhraseQuery(\"KK? KSA\"~1)",
"parsedquery_toString":"\"KK? KSA\"~1",
"explain":{},
"QParser":"ComplexPhraseQParser”,
Chuming
On Nov 16, 2017, at 11:49 AM, Shawn Heisey <[email protected]> wrote:
> On 11/16/2017 8:40 AM, Chuming Chen wrote:
>> I think the position is the issue, but how do I fix it? Is something wrong
>> with my index analyzer or just my query is not right? I need to do phrase
>> query, order is important here.
>>
>> I tried “KKS KSA”~1 in the query, it worked. However, if I do "KKS KSA
>> SAR”~1, it didn’t work, I had to do "KKS KSA SAR”~2.
>>
>> Is phrase slop essential here. I used to with Solr 3.5, no phrase slop is
>> needed.
>
> The ~2 slop for the three-term query is correct -- because the third
> term will have position 3 on the query, and position 1 in the index, so
> it must adjust two places in order to match.
>
> If you look at the index analysis, you'll see that the positions of the
> 3 character ngrams are all the same. I believe this is typical for that
> filter, to maintain the position of the original term for all ngrams.
> It wouldn't surprise me to learn that version 3.5 had very different
> behavior in the ngram filter regarding positions, behavior that was
> considered incorrect and fixed.
>
> I think this might be the applicable issue:
>
> https://issues.apache.org/jira/browse/LUCENE-4955
>
> Thanks,
> Shawn
>