RE: Complexphrase treats wildcards differently than other query parsers
Right. Sorry. Despite appearances to the contrary, I'm not a bot designed to lead you down the garden path of debugging for yourself with the goal of increasing the size of the Solr contributor pool... I confirmed the failure in 6.x, but all seems to work in 7.x and trunk. I opened SOLR-11450 and attached a unit test based on your correction of mine. Thank you, again! -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Monday, October 9, 2017 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers Thanks again, Tim, following your recipe, I was able to write a failing test: assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I originally reported, namely that CPQP does not analyse it because of the wildcard and thus does not hit the charfilter from the query side. 2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>: > That could be it. I'm not able to reproduce this with trunk. More > next week. > > In trunk, if I add this to schema15.xml: > > >mapping="mapping- ISOLatin1Accent.txt"/> > > > >stored="true"/> > > This test passes. > > @Test > public void testCharFilter() { > assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1")); > assertU(commit()); > assertU(optimize()); > > assertQ(req("q", "{!complexphrase} iso-latin1:craezy") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:traen") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:crae*") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:*aezy") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:crae*y") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > } > > > > -Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Friday, October 6, 2017 6:46 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other > query parsers > > Thanks a lot for your effort, Tim. > > Looking at it from the Solr side, I see some use of local classes. The > snippet below in particular caught my eye (in > solr/core/src/java/org/apache/ solr/search/ComplexPhraseQParserPlugin.java). > The instance of ComplexPhraseQueryParser is not the clean one from > Lucene, but a modified one. If any of the modifications messes with > the analysis logic, well then that might answer it. > > What do you make of it? > > lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema(). >
Re: Complexphrase treats wildcards differently than other query parsers
Thanks again, Tim, following your recipe, I was able to write a failing test: assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I originally reported, namely that CPQP does not analyse it because of the wildcard and thus does not hit the charfilter from the query side. 2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>: > That could be it. I'm not able to reproduce this with trunk. More next > week. > > In trunk, if I add this to schema15.xml: > > > > > > >stored="true"/> > > This test passes. > > @Test > public void testCharFilter() { > assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1")); > assertU(commit()); > assertU(optimize()); > > assertQ(req("q", "{!complexphrase} iso-latin1:craezy") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:traen") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:crae*") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:*aezy") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:crae*y") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > > assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"") > , "//result[@numFound='1']" > , "//doc[./str[@name='id']='1']" > ); > } > > > > -Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Friday, October 6, 2017 6:46 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other query > parsers > > Thanks a lot for your effort, Tim. > > Looking at it from the Solr side, I see some use of local classes. The > snippet below in particular caught my eye (in solr/core/src/java/org/apache/ > solr/search/ComplexPhraseQParserPlugin.java). > The instance of ComplexPhraseQueryParser is not the clean one from Lucene, > but a modified one. If any of the modifications messes with the analysis > logic, well then that might answer it. > > What do you make of it? > > lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema(). > getQueryAnalyzer()) > { > protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try { > org.apache.lucene.search.Query wildcardQuery = reverseAwareParser. > getWildcardQuery(t.field(), t.text()); > setRewriteMethod(wildcardQuery); > return wildcardQuery; > } catch (SyntaxError e) { > throw new RuntimeException(e); > } > } > private Query setRewriteMethod(org.apache.lucene.search.Query query) { if > (query instanceof MultiTermQuery) { > ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search. > MultiTermQuery.SCORING_BOOLEAN_REWRITE); > } > return query; > } > protected Query newRangeQuery(String field, String part1, String part2, > boolean startInclusive, boolean endInclusive) { boolean reverse = > reverseAwareParser.isRangeShouldBeProtectedFromReverse(field, > part1); > return super.newRangeQuery(field, > reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, > startInclusive || reverse, endInclusive); } } ; > > Thanks, > Bjarke > > >
RE: Complexphrase treats wildcards differently than other query parsers
That could be it. I'm not able to reproduce this with trunk. More next week. In trunk, if I add this to schema15.xml: This test passes. @Test public void testCharFilter() { assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1")); assertU(commit()); assertU(optimize()); assertQ(req("q", "{!complexphrase} iso-latin1:craezy") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:traen") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:crae*") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:*aezy") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:crae*y") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"") , "//result[@numFound='1']" , "//doc[./str[@name='id']='1']" ); } -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Friday, October 6, 2017 6:46 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers Thanks a lot for your effort, Tim. Looking at it from the Solr side, I see some use of local classes. The snippet below in particular caught my eye (in solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java). The instance of ComplexPhraseQueryParser is not the clean one from Lucene, but a modified one. If any of the modifications messes with the analysis logic, well then that might answer it. What do you make of it? lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema(). getQueryAnalyzer()) { protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try { org.apache.lucene.search.Query wildcardQuery = reverseAwareParser. getWildcardQuery(t.field(), t.text()); setRewriteMethod(wildcardQuery); return wildcardQuery; } catch (SyntaxError e) { throw new RuntimeException(e); } } private Query setRewriteMethod(org.apache.lucene.search.Query query) { if (query instanceof MultiTermQuery) { ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE); } return query; } protected Query newRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive) { boolean reverse = reverseAwareParser.isRangeShouldBeProtectedFromReverse(field, part1); return super.newRangeQuery(field, reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, startInclusive || reverse, endInclusive); } } ; Thanks, Bjarke
Re: Complexphrase treats wildcards differently than other query parsers
Thanks a lot for your effort, Tim. Looking at it from the Solr side, I see some use of local classes. The snippet below in particular caught my eye (in solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java). The instance of ComplexPhraseQueryParser is not the clean one from Lucene, but a modified one. If any of the modifications messes with the analysis logic, well then that might answer it. What do you make of it? lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema(). getQueryAnalyzer()) { protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try { org.apache.lucene.search.Query wildcardQuery = reverseAwareParser. getWildcardQuery(t.field(), t.text()); setRewriteMethod(wildcardQuery); return wildcardQuery; } catch (SyntaxError e) { throw new RuntimeException(e); } } private Query setRewriteMethod(org.apache.lucene.search.Query query) { if (query instanceof MultiTermQuery) { ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE); } return query; } protected Query newRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive) { boolean reverse = reverseAwareParser.isRangeShouldBeProtectedFromReverse(field, part1); return super.newRangeQuery(field, reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, startInclusive || reverse, endInclusive); } } ; Thanks, Bjarke 2017-10-05 21:15 GMT+02:00 Allison, Timothy B.: > After some more digging, I'm wrong even at the Lucene level. > > When I use the CustomAnalyzer and make my UC vowel mock filter > MultitermAware, I get this with Lucene in trunk: > > "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck > > So, there's room for improvement with phrases, but the regular multiterms > should be ok. > > Still no answer for you... > > 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. : > > > There's every chance that I'm missing something at the Solr level, but > > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still > > not applying analysis to multiterms. > > > > When I call this on 7.0.0: > >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > > analyzer); > > return qp.parse(qString); > > > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > > qString is; > > > > "the* quick~" the* quick~ the quick > > > > I get this: > > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > >
RE: Complexphrase treats wildcards differently than other query parsers
After some more digging, I'm wrong even at the Lucene level. When I use the CustomAnalyzer and make my UC vowel mock filter MultitermAware, I get this with Lucene in trunk: "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck So, there's room for improvement with phrases, but the regular multiterms should be ok. Still no answer for you... 2017-10-05 14:34 GMT+02:00 Allison, Timothy B.: > There's every chance that I'm missing something at the Solr level, but > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still > not applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck
RE: Complexphrase treats wildcards differently than other query parsers
Prob the usual reasons...no one has submitted a patch yet, or could be a regression after LUCENE-7355. See also: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201407.mbox/%3c1d06a081892adf4589bd83ee24b9dc3025971...@imcmbx02.mitre.org%3E I'll take a look. -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 8:52 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-) Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>: > There's every chance that I'm missing something at the Solr level, but > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still > not applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > > > [1] https://github.com/tballison/lucene-addons/blob/master/ > lucene-5205/src/test/java/org/apache/lucene/queryparser/ > spans/TestAdvancedAnalyzers.java#L117 > > -Original Message- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Thursday, October 5, 2017 8:02 AM > To: solr-user@lucene.apache.org > Subject: RE: Complexphrase treats wildcards differently than other > query parsers > > What version of Solr are you using? > > I thought this had been fixed fairly recently, but I can't quickly > find the JIRA. Let me take a look. > > Best, > > Tim > > This was one of my initial reasons for my SpanQueryParser > LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. > > [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 > [2] https://mvnrepository.com/artifact/org.tallison.lucene/ > lucene-5205/6.6-0.1 > > -----Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Thursday, October 5, 2017 6:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other > query parsers > > 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > > > Hi Bjarke, > > You are right - I jumped into wrong/old conclusion as the simplest > > answer to your question. > > > No problem :-) > > I guess looking at the code could give you an answer. > > > > This is what I would like to avoid out of fear that my head would > explode > ;-) > > > > > > Thanks, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > <morten...@eluence.com> > > wrote: > > > > > > Well, according to > > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > > wildcard-multiterm-queries-in-solr/ > > > multiterm means > > > > > > wildcard > > > range > > > prefix > > > > > > so it is that way i'm using the word. That same article explains > > > how analysis will be performed with wildcards if the analyzers are > > > multi-term aware. > > > Furthermore, both lucene and dismax do the correct analysis, so I > > > don't think you are right in your statement about the majority of > > > QPs skipping analysis for wildcards. > > > > > > So I'm still confused as to why complexphrase does things differently. > > > > > > Thanks, > > > /Bjarke > > > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > ><emir.arnauto...@sematext.com > > >: > > > > > >> Hi Bjarke, > > >> It is not multiterm that is causing query parser to skip analysis > > >> chain but wildcard. The majority of query parsers do not analyse > > >> query string > > if > > >> there are wildcards. > > >> > > >> HTH > > >> Emir > > >> -- > > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > &
Re: Complexphrase treats wildcards differently than other query parsers
Thanks Tim, that might be what I'm experiencing. I'm actually quite certain of it :-) Do you remember any reason that multi term analysis is not happening in ComplexPhraseQueryParser? I'm on 6.6.1, so latest on the 6.x branch. 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>: > There's every chance that I'm missing something at the Solr level, but it > _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not > applying analysis to multiterms. > > When I call this on 7.0.0: >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, > analyzer); > return qp.parse(qString); > > where the analyzer is a mock "uppercase vowel" analyzer[1] and the > qString is; > > "the* quick~" the* quick~ the quick > > I get this: > "the* quick~" name:the* name:quick~2 name:thE name:qUIck > > > [1] https://github.com/tballison/lucene-addons/blob/master/ > lucene-5205/src/test/java/org/apache/lucene/queryparser/ > spans/TestAdvancedAnalyzers.java#L117 > > -Original Message- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Thursday, October 5, 2017 8:02 AM > To: solr-user@lucene.apache.org > Subject: RE: Complexphrase treats wildcards differently than other query > parsers > > What version of Solr are you using? > > I thought this had been fixed fairly recently, but I can't quickly find > the JIRA. Let me take a look. > > Best, > > Tim > > This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] > and [2], which handles analysis of multiterms even in phrases. > > [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 > [2] https://mvnrepository.com/artifact/org.tallison.lucene/ > lucene-5205/6.6-0.1 > > -----Original Message- > From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] > Sent: Thursday, October 5, 2017 6:28 AM > To: solr-user@lucene.apache.org > Subject: Re: Complexphrase treats wildcards differently than other query > parsers > > 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > > > Hi Bjarke, > > You are right - I jumped into wrong/old conclusion as the simplest > > answer to your question. > > > No problem :-) > > I guess looking at the code could give you an answer. > > > > This is what I would like to avoid out of fear that my head would explode > ;-) > > > > > > Thanks, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > > <morten...@eluence.com> > > wrote: > > > > > > Well, according to > > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > > wildcard-multiterm-queries-in-solr/ > > > multiterm means > > > > > > wildcard > > > range > > > prefix > > > > > > so it is that way i'm using the word. That same article explains how > > > analysis will be performed with wildcards if the analyzers are > > > multi-term aware. > > > Furthermore, both lucene and dismax do the correct analysis, so I > > > don't think you are right in your statement about the majority of > > > QPs skipping analysis for wildcards. > > > > > > So I'm still confused as to why complexphrase does things differently. > > > > > > Thanks, > > > /Bjarke > > > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > > ><emir.arnauto...@sematext.com > > >: > > > > > >> Hi Bjarke, > > >> It is not multiterm that is causing query parser to skip analysis > > >> chain but wildcard. The majority of query parsers do not analyse > > >> query string > > if > > >> there are wildcards. > > >> > > >> HTH > > >> Emir > > >> -- > > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > >> Elasticsearch Consulting Support Training - http://sematext.com/ > > >> > > >> > > >> > > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > > >>> <morten...@eluence.com> > > >> wrote: > > >>> > > >>> Hi list, > > >>> > > >>> I'm trying to search for the term funktionsnedsättning* In my > > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. &g
RE: Complexphrase treats wildcards differently than other query parsers
There's every chance that I'm missing something at the Solr level, but it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not applying analysis to multiterms. When I call this on 7.0.0: QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer); return qp.parse(qString); where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is; "the* quick~" the* quick~ the quick I get this: "the* quick~" name:the* name:quick~2 name:thE name:qUIck [1] https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117 -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, October 5, 2017 8:02 AM To: solr-user@lucene.apache.org Subject: RE: Complexphrase treats wildcards differently than other query parsers What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 [2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 6:28 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest > answer to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > <morten...@eluence.com> > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are > > multi-term aware. > > Furthermore, both lucene and dismax do the correct analysis, so I > > don't think you are right in your statement about the majority of > > QPs skipping analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > ><emir.arnauto...@sematext.com > >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis > >> chain but wildcard. The majority of query parsers do not analyse > >> query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > >> Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >>> <morten...@eluence.com> > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* In my > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning > >>> * > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis > >>> chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
RE: Complexphrase treats wildcards differently than other query parsers
What version of Solr are you using? I thought this had been fixed fairly recently, but I can't quickly find the JIRA. Let me take a look. Best, Tim This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases. [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205 [2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 -Original Message- From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] Sent: Thursday, October 5, 2017 6:28 AM To: solr-user@lucene.apache.org Subject: Re: Complexphrase treats wildcards differently than other query parsers 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest > answer to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > > <morten...@eluence.com> > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are > > multi-term aware. > > Furthermore, both lucene and dismax do the correct analysis, so I > > don't think you are right in your statement about the majority of > > QPs skipping analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović > ><emir.arnauto...@sematext.com > >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis > >> chain but wildcard. The majority of query parsers do not analyse > >> query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & > >> Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >>> <morten...@eluence.com> > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* In my > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives > >>> me "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning > >>> * > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis > >>> chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
Re: Complexphrase treats wildcards differently than other query parsers
2017-10-05 11:29 GMT+02:00 Emir Arnautović: > Hi Bjarke, > You are right - I jumped into wrong/old conclusion as the simplest answer > to your question. No problem :-) I guess looking at the code could give you an answer. > This is what I would like to avoid out of fear that my head would explode ;-) > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen > wrote: > > > > Well, according to > > https://lucidworks.com/2011/11/29/whats-with-lowercasing- > wildcard-multiterm-queries-in-solr/ > > multiterm means > > > > wildcard > > range > > prefix > > > > so it is that way i'm using the word. That same article explains how > > analysis will be performed with wildcards if the analyzers are multi-term > > aware. > > Furthermore, both lucene and dismax do the correct analysis, so I don't > > think you are right in your statement about the majority of QPs skipping > > analysis for wildcards. > > > > So I'm still confused as to why complexphrase does things differently. > > > > Thanks, > > /Bjarke > > > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović >: > > > >> Hi Bjarke, > >> It is not multiterm that is causing query parser to skip analysis chain > >> but wildcard. The majority of query parsers do not analyse query string > if > >> there are wildcards. > >> > >> HTH > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > >> wrote: > >>> > >>> Hi list, > >>> > >>> I'm trying to search for the term funktionsnedsättning* > >>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > >>> So I would expect that funktionsnedsättning* would translate to > >>> funktionsnedsattning*. > >>> > >>> If I use e.g. the lucene query parser, this is indeed what happens: > >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsattning*" > >>> and 15 documents returned. > >>> > >>> Trying the same with complexphrase gives me: > >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* > >> gives me > >>> "rawquerystring":"funktionsnedsättning*", "querystring": > >>> "funktionsnedsättning*", "parsedquery":"content_ol: > >> funktionsnedsättning*" > >>> and 0 documents. Notice how ä has not been changed to a. > >>> > >>> How can this be? Is complexphrase somehow skipping the analysis chain > for > >>> multiterms, even though components and in particular > >>> MappingCharFilterFactory are Multi-term aware > >>> > >>> Are there any configuration gotchas that I'm not aware of? > >>> > >>> Thanks for the help, > >>> Bjarke Buur Mortensen > >>> Senior Software Engineer, Eluence A/S > >> > >> > >
Re: Complexphrase treats wildcards differently than other query parsers
Hi Bjarke, You are right - I jumped into wrong/old conclusion as the simplest answer to your question. I guess looking at the code could give you an answer. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensenwrote: > > Well, according to > https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ > multiterm means > > wildcard > range > prefix > > so it is that way i'm using the word. That same article explains how > analysis will be performed with wildcards if the analyzers are multi-term > aware. > Furthermore, both lucene and dismax do the correct analysis, so I don't > think you are right in your statement about the majority of QPs skipping > analysis for wildcards. > > So I'm still confused as to why complexphrase does things differently. > > Thanks, > /Bjarke > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović : > >> Hi Bjarke, >> It is not multiterm that is causing query parser to skip analysis chain >> but wildcard. The majority of query parsers do not analyse query string if >> there are wildcards. >> >> HTH >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen >> wrote: >>> >>> Hi list, >>> >>> I'm trying to search for the term funktionsnedsättning* >>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a. >>> So I would expect that funktionsnedsättning* would translate to >>> funktionsnedsattning*. >>> >>> If I use e.g. the lucene query parser, this is indeed what happens: >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me >>> "rawquerystring":"funktionsnedsättning*", "querystring": >>> "funktionsnedsättning*", "parsedquery":"content_ol: >> funktionsnedsattning*" >>> and 15 documents returned. >>> >>> Trying the same with complexphrase gives me: >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* >> gives me >>> "rawquerystring":"funktionsnedsättning*", "querystring": >>> "funktionsnedsättning*", "parsedquery":"content_ol: >> funktionsnedsättning*" >>> and 0 documents. Notice how ä has not been changed to a. >>> >>> How can this be? Is complexphrase somehow skipping the analysis chain for >>> multiterms, even though components and in particular >>> MappingCharFilterFactory are Multi-term aware >>> >>> Are there any configuration gotchas that I'm not aware of? >>> >>> Thanks for the help, >>> Bjarke Buur Mortensen >>> Senior Software Engineer, Eluence A/S >> >>
Re: Complexphrase treats wildcards differently than other query parsers
Well, according to https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ multiterm means wildcard range prefix so it is that way i'm using the word. That same article explains how analysis will be performed with wildcards if the analyzers are multi-term aware. Furthermore, both lucene and dismax do the correct analysis, so I don't think you are right in your statement about the majority of QPs skipping analysis for wildcards. So I'm still confused as to why complexphrase does things differently. Thanks, /Bjarke 2017-10-05 10:16 GMT+02:00 Emir Arnautović: > Hi Bjarke, > It is not multiterm that is causing query parser to skip analysis chain > but wildcard. The majority of query parsers do not analyse query string if > there are wildcards. > > HTH > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen > wrote: > > > > Hi list, > > > > I'm trying to search for the term funktionsnedsättning* > > In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > > So I would expect that funktionsnedsättning* would translate to > > funktionsnedsattning*. > > > > If I use e.g. the lucene query parser, this is indeed what happens: > > ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me > > "rawquerystring":"funktionsnedsättning*", "querystring": > > "funktionsnedsättning*", "parsedquery":"content_ol: > funktionsnedsattning*" > > and 15 documents returned. > > > > Trying the same with complexphrase gives me: > > ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* > gives me > > "rawquerystring":"funktionsnedsättning*", "querystring": > > "funktionsnedsättning*", "parsedquery":"content_ol: > funktionsnedsättning*" > > and 0 documents. Notice how ä has not been changed to a. > > > > How can this be? Is complexphrase somehow skipping the analysis chain for > > multiterms, even though components and in particular > > MappingCharFilterFactory are Multi-term aware > > > > Are there any configuration gotchas that I'm not aware of? > > > > Thanks for the help, > > Bjarke Buur Mortensen > > Senior Software Engineer, Eluence A/S > >
Re: Complexphrase treats wildcards differently than other query parsers
Hi Bjarke, It is not multiterm that is causing query parser to skip analysis chain but wildcard. The majority of query parsers do not analyse query string if there are wildcards. HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 4 Oct 2017, at 22:08, Bjarke Buur Mortensenwrote: > > Hi list, > > I'm trying to search for the term funktionsnedsättning* > In my analyzer chain I use a MappingCharFilterFactory to change ä to a. > So I would expect that funktionsnedsättning* would translate to > funktionsnedsattning*. > > If I use e.g. the lucene query parser, this is indeed what happens: > ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me > "rawquerystring":"funktionsnedsättning*", "querystring": > "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsattning*" > and 15 documents returned. > > Trying the same with complexphrase gives me: > ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* gives me > "rawquerystring":"funktionsnedsättning*", "querystring": > "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsättning*" > and 0 documents. Notice how ä has not been changed to a. > > How can this be? Is complexphrase somehow skipping the analysis chain for > multiterms, even though components and in particular > MappingCharFilterFactory are Multi-term aware > > Are there any configuration gotchas that I'm not aware of? > > Thanks for the help, > Bjarke Buur Mortensen > Senior Software Engineer, Eluence A/S