Re: Edismax ignoring queries containing booleans

Edward Ribeiro Fri, 10 Jan 2020 04:19:50 -0800

Cool, glad to help, Paras and Claire. :)

Cheers,
Edward



Em sex, 10 de jan de 2020 06:31, Paras Lehana <paras.leh...@indiamart.com>
escreveu:

> Hi Edward, the way you have explained mm and fq's relation with parser has
> cleared all my potential queries. I didn't know fq supports other parsers.
> :)
>
> On Fri, 10 Jan 2020 at 10:46, Edward Ribeiro <edward.ribe...@gmail.com>
> wrote:
>
> > The fq is not affected by mm parameter because it uses Solr's default
> query
> > parser (LuceneQueryParser) that doesn't support it. But you can change
> the
> > parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
> > mm=1}recordID:(10 20) , for example (even though that is not the case
> > here).
> >
> > Please, let me know if any of the suggestions, or any other you come up
> > with, solve the issue and don't forget to test those approaches so that
> you
> > can avoid any performance degradation.
> >
> > Best,
> > Edward
> >
> > On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro <edward.ribe...@gmail.com
> >
> > wrote:
> >
> > > Hi Claire,
> > >
> > > > The only visual difference I think is the ~2 which came after the
> > > initial part of the parsed query:
> > > > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))~2
> > > > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))
> > >
> > > The mm (minimum match) parameter alter the behaviour of the OR clauses.
> > > See here:
> > >
> >
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
> > > For example, if there is a query like `text:(toys OR children OR
> sales)`,
> > > but your mm=3, then at least three terms are required to match. The
> query
> > > is now equivalent to `text:(toys AND children AND sales)`
> > >
> > > In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
> > > 20]))~2" query the "))~2" part means that at least two matches are
> > required
> > > of the three optional terms: 18, 19, and 20. But recordID will only
> match
> > > at most one term. Therefore, it will return no documents because it
> will
> > > never satisfy the condition setup by mm (match 18 AND 19 AND 20). If
> mm=1
> > > the query would work as intended in this example.
> > >
> > > The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
> > > translated as:
> > >
> > > * 0<1 : If there is one term then minimum match 1??? Didn't get this
> one.
> > >
> > > * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match
> all.
> > > Between 3 and 5 (inclusive) terms match all but one (in your example
> > there
> > > are 3 numbers so it will require to match at least 2, that’s the reason
> > of
> > > the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms
> > then
> > > matches 90% of the terms (e.g., if there are 10 clauses then it is
> > required
> > > to match at least 9).
> > >
> > > > There shouldn't be a problem using mm with edismax right? Or does the
> > > problem lie with the structure of my qf/pf and then adding mm?
> > >
> > > Nope. There’s no problem using mm with edismax nor the problem lies on
> > > qf/pf. As you dig
> > >
> > > > I can see this is a change to default behaviour, but does it mean I
> > > should be passing mm in the query now rather than just at config level?
> > >
> > > I see a couple of approaches to solve this issue:
> > >
> > > 1) Removing the mm parameter from solrconfig. But it probably was setup
> > > for a reason so you should check before hand. In this case, you could
> > issue
> > > mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
> > >
> > > 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> > > Issuing the parameter in the query will overwrite the mm parameter that
> > was
> > > setup in solrconfig for that particular query.
> > >
> > > 3) Doing a match all query (q=*:*) and moving the recordID query to a
> > > filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by
> mm
> > > parameter or so it seems. No need to change mm in solrconfig nor adding
> > mm
> > > as a query parameter.
> > >
> > > Particularly, I would go with either 2) or 3).
> > >
> > > Best,
> > > Edward
> > >
> > > On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard <
> claire.poll...@imagen.io>
> > > wrote:
> > > >
> > > > Also, I've found this bug from previous which highlights the issue
> with
> > > ))~2
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-8812
> > > >
> > > > mm is set at config, but not explicitly in the query...
> > > >
> > > > I can see this is a change to default behaviour, but does it mean I
> > > should be passing mm in the query now rather than just at config level?
> > > >
> > > > -----Original Message-----
> > > > From: Claire Pollard <claire.poll...@imagen.io>
> > > > Sent: 09 January 2020 10:23
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: Edismax ignoring queries containing booleans
> > > >
> > > > Hey Edward,
> > > >
> > > > Thanks for the tips.
> > > >
> > > > I've cleaned up my solrconfig, removed the duplicate df, tabs and
> > > newlines, and tried commenting out the bits you've suggested and adding
> > > them back in bit by bit, and it seems mm was the thing which is
> breaking
> > > the query for me.
> > > >
> > > > Without it, the query returns 2 documents as expected.
> > > >
> > > > "debug":{
> > > >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> > > >     "querystring":"recordID:(18 OR 19 OR 20)",
> > > >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
> > > (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> (Test_AR:\"19
> > > 20\"~100)^1.1))",
> > > >     "parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19]
> > > recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> > > 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> (Test_AR:\"19
> > > 20\"~100)^1.1)",
> > > >     "explain":{
> > > >       "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0
> =
> > > sum of:\n    1.0 = recordID:[19 TO 19]\n",
> > > >       "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0
> =
> > > sum of:\n    1.0 = recordID:[20 TO 20]\n"},
> > > >
> > > > The only visual difference I think is the ~2 which came after the
> > > initial part of the parsed query:
> > > >
> > > > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))~2 New Query start: +((recordID:[18 TO 18])
> > > (recordID:[19 TO 19]) (recordID:[20 TO 20]))
> > > >
> > > > There shouldn't be a problem using mm with edismax right? Or does the
> > > problem lie with the structure of my qf/pf and then adding mm?
> > > >
> > > > Cheers,
> > > > Claire.
> > > >
> > > > -----Original Message-----
> > > > From: Edward Ribeiro <edward.ribe...@gmail.com>
> > > > Sent: 09 January 2020 02:28
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Edismax ignoring queries containing booleans
> > > >
> > > > Hi Claire,
> > > >
> > > > Unfortunately I didn't see anything in the debug explain that could
> > > potentially be the source of the problem. As Saurabh, I tested on a
> core
> > > and it worked for me.
> > > >
> > > > I suggest that you simplify the solrconfig (commenting out qf, mm,
> > > spellchecker config and pf, for example) and reload the core. If the
> > query
> > > works then you  reinsert the config one by one, reloading the core and
> > see
> > > if the query works.
> > > >
> > > > A few remarks based on a snippet of the solrconfig you posted on a
> > > previous
> > > > e-mail:
> > > >
> > > > * Your solrconfig.xml defines df two times (the debug shows
> > > "df":["text", "text"]);
> > > >
> > > > * There are a couple codes like &#x09;
> > > > &#x0D; and &#x0A; It would be nice to remove It;
> > > >
> > > > Please, let us know if you find why. :)
> > > >
> > > > Best,
> > > > Edward
> > > >
> > > >
> > > > Em qua, 8 de jan de 2020 13:00, Claire Pollard <
> > claire.poll...@imagen.io
> > > >
> > > > escreveu:
> > > >
> > > > > It would be lovely to be able to use range to complete my searches,
> > > > > but sadly documents aren't necessarily sequential so I might want
> say
> > > > > 18, 24 or
> > > > > 30 in future.
> > > > >
> > > > > I've re-run the query with debug on. Is there anything here that
> > looks
> > > > > unusual? Thanks.
> > > > >
> > > > > {
> > > > >   "responseHeader":{
> > > > >     "status":0,
> > > > >     "QTime":75,
> > > > >     "params":{
> > > > >       "mm":"\r\n       0<1 2<-1 5<-2 6<90%\r\n      ",
> > > > >       "spellcheck.collateExtendedResults":"true",
> > > > >       "df":["text",
> > > > >         "text"],
> > > > >       "q.alt":"*:*",
> > > > >       "ps":"100",
> > > > >       "spellcheck.dictionary":["default",
> > > > >         "wordbreak"],
> > > > >       "bf":"",
> > > > >       "echoParams":"all",
> > > > >       "fl":"*,score",
> > > > >       "spellcheck.maxCollations":"5",
> > > > >       "rows":"10",
> > > > >       "spellcheck.alternativeTermCount":"5",
> > > > >       "spellcheck.extendedResults":"true",
> > > > >       "q":"recordID:(18 OR 19 OR 20)",
> > > > >       "defType":"edismax",
> > > > >       "spellcheck.maxResultsForSuggest":"5",
> > > > >       "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> > > > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > > > title^2.0
> > > > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > > > > french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
> > > > >       "spellcheck":"on",
> > > > >       "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> > > > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > > > title^2.1
> > > > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > > > > french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
> > > > >       "spellcheck.count":"10",
> > > > >       "debugQuery":"on",
> > > > >       "_":"1578499092576",
> > > > >       "spellcheck.collate":"true"}},
> > > > >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> > > > >   },
> > > > >   "spellcheck":{
> > > > >     "suggestions":[],
> > > > >     "correctlySpelled":false,
> > > > >     "collations":[]},
> > > > >   "debug":{
> > > > >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> > > > >     "querystring":"recordID:(18 OR 19 OR 20)",
> > > > >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > > > (recordID:[20 TO 20]))~2 DisjunctionMaxQuery(((text:\"19
> > 20\"~100)^0.2
> > > > > |
> > > > > (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19
> 20\"~100)^2.0
> > > > > |
> > > > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > > > (Test_AR:\"19 20\"~100)^1.1))",
> > > > >     "parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO
> > 19]
> > > > > recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 |
> (annotations:\"19
> > > > > 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > > > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > > > (Test_AR:\"19 20\"~100)^1.1)",
> > > > >     "explain":{},
> > > > >     "QParser":"ExtendedDismaxQParser",
> > > > >     "altquerystring":null,
> > > > >     "boost_queries":null,
> > > > >     "parsed_boost_queries":[],
> > > > >     "boostfuncs":[""],
> > > > >     "timing":{
> > > > >       "time":75.0,
> > > > >       "prepare":{
> > > > >         "time":35.0,
> > > > >         "query":{
> > > > >           "time":35.0},
> > > > >         "facet":{
> > > > >           "time":0.0},
> > > > >         "facet_module":{
> > > > >           "time":0.0},
> > > > >         "mlt":{
> > > > >           "time":0.0},
> > > > >         "highlight":{
> > > > >           "time":0.0},
> > > > >         "stats":{
> > > > >           "time":0.0},
> > > > >         "expand":{
> > > > >           "time":0.0},
> > > > >         "terms":{
> > > > >           "time":0.0},
> > > > >         "spellcheck":{
> > > > >           "time":0.0},
> > > > >         "debug":{
> > > > >           "time":0.0}},
> > > > >       "process":{
> > > > >         "time":38.0,
> > > > >         "query":{
> > > > >           "time":29.0},
> > > > >         "facet":{
> > > > >           "time":0.0},
> > > > >         "facet_module":{
> > > > >           "time":0.0},
> > > > >         "mlt":{
> > > > >           "time":0.0},
> > > > >         "highlight":{
> > > > >           "time":0.0},
> > > > >         "stats":{
> > > > >           "time":0.0},
> > > > >         "expand":{
> > > > >           "time":0.0},
> > > > >         "terms":{
> > > > >           "time":0.0},
> > > > >         "spellcheck":{
> > > > >           "time":6.0},
> > > > >         "debug":{
> > > > >           "time":1.0}}}}}
> > > > >
> > > > > -----Original Message-----
> > > > > From: Edward Ribeiro <edward.ribe...@gmail.com>
> > > > > Sent: 07 January 2020 01:05
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > >
> > > > > Hi Claire,
> > > > >
> > > > > You can add the following parameter `&debug=all` on the URL to
> bring
> > > > > back debugging info and share with us (if you are using the Solr
> > admin
> > > > > UI you should check the `debugQuery` checkbox).
> > > > >
> > > > > Also, if you are searching a sequence of values you could perform a
> > > > > range
> > > > > query: recordID:[18 TO 20]
> > > > >
> > > > > Best,
> > > > > Edward
> > > > >
> > > > > On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard
> > > > > <claire.poll...@imagen.io>
> > > > > wrote:
> > > > > >
> > > > > > Ok... It doesn't work for me. I'm fairly new to Solr so any help
> > > > > > would be
> > > > > appreciated!
> > > > > >
> > > > > > My managed-schema field and field type look like this:
> > > > > >
> > > > > > <field name="recordID" type="long" indexed="true" stored="true"
> > > > > required="true" multiValued="false" />
> > > > > > <fieldType name="long" class="solr.LongPointField"
> > > sortMissingLast="true"
> > > > > omitNorms="true" />
> > > > > >
> > > > > > And my solrconfig.xml select/query handlers look like this:
> > > > > >
> > > > > >         <requestHandler name="/select"
> class="solr.SearchHandler">
> > > > > >                 <lst name="defaults">
> > > > > >                         <str name="echoParams">all</str>
> > > > > >                         <!-- Query settings -->
> > > > > >                         <str name="defType">edismax</str>
> > > > > >                         <str name="qf">
> > > > > >                                 &#x09;text^0.4 recordID^10.0
> > > > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > > > title^2.0
> > > > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > > > > french2^1.0&#x0D;&#x0A;
> > > > > >                         </str>
> > > > > >                         <str name="df">text</str>
> > > > > >                         <str name="q.alt">*:*</str>
> > > > > >                         <str name="rows">10</str>
> > > > > >                         <str name="fl">*,score</str>
> > > > > >                         <str name="pf">
> > > > > >                                 &#x09;text^0.2 recordID^10.0
> > > > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > > > title^2.1
> > > > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > > > > french2^1.1&#x0D;&#x0A;</str>
> > > > > >                         <str name="bf" />
> > > > > >                         <str name="mm">&#x0D;&#x0A;       0&lt;1
> > > 2&lt;-1
> > > > > 5&lt;-2 6&lt;90%&#x0D;&#x0A;      </str>
> > > > > >                         <int name="ps">100</int>
> > > > > >                         <!--SpellChecking -->
> > > > > >                         <str name="df">text</str>
> > > > > >                         <!-- Solr will use suggestions from both
> > the
> > > > > 'default' spellchecker
> > > > > >      and from the 'wordbreak' spellchecker and combine them.
> > > > > >      collations (re-written queries) can include a combination of
> > > > > >      corrections from both spellcheckers -->
> > > > > >                         <str
> > > name="spellcheck.dictionary">default</str>
> > > > > >                         <str
> > > name="spellcheck.dictionary">wordbreak</str>
> > > > > >                         <str name="spellcheck">on</str>
> > > > > >                         <str
> > > name="spellcheck.extendedResults">true</str>
> > > > > >                         <str name="spellcheck.count">10</str>
> > > > > >                         <str
> > > > > name="spellcheck.alternativeTermCount">5</str>
> > > > > >                         <str
> > > > > name="spellcheck.maxResultsForSuggest">5</str>
> > > > > >                         <str name="spellcheck.collate">true</str>
> > > > > >                         <str
> > > > > name="spellcheck.collateExtendedResults">true</str>
> > > > > >                         <str
> > name="spellcheck.maxCollations">5</str>
> > > > > >                 </lst>
> > > > > >                 <arr name="last-components">
> > > > > >                         <str>spellcheck</str>
> > > > > >                 </arr>
> > > > > >                 <!-- In addition to defaults, "appends" params
> can
> > > > > > be
> > > > > specified
> > > > > >          to identify values which should be appended to the list
> of
> > > > > >          multi-val params from the query (or the existing
> > > "defaults").
> > > > > >       -->
> > > > > >         </requestHandler>
> > > > > >
> > > > > >         <requestHandler name="/query" class="solr.SearchHandler">
> > > > > >                 <lst name="defaults">
> > > > > >                         <str name="echoParams">explicit</str>
> > > > > >                         <str name="wt">json</str>
> > > > > >                         <str name="indent">true</str>
> > > > > >                         <str name="df">text</str>
> > > > > >                 </lst>
> > > > > >         </requestHandler>
> > > > > >
> > > > > > Is there anything else that might be useful in helping diagnose
> > > > > > what's
> > > > > going wrong for me?
> > > > > >
> > > > > > Cheers,
> > > > > > Claire.
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Saurabh Sharma <saurabh.infoe...@gmail.com>
> > > > > > Sent: 06 January 2020 11:20
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > > >
> > > > > > It should work well. I have just tested the same with 8.3.0.
> > > > > >
> > > > > > Thanks
> > > > > > Saurabh Sharma
> > > > > >
> > > > > > On Mon, Jan 6, 2020, 4:31 PM Claire Pollard
> > > > > > <claire.poll...@imagen.io>
> > > > > > wrote:
> > > > > >
> > > > > > > I'm using:
> > > > > > >
> > > > > > > recordID:(18 OR 19 OR 20)
> > > > > > >
> > > > > > > Which should return 2 records (as 18 doesn't exist), but it
> > > > > > > returns
> > > > > none.
> > > > > > > recordID is a LongPointField (sorry I said Int in my previous
> > > message).
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Saurabh Sharma <saurabh.infoe...@gmail.com>
> > > > > > > Sent: 06 January 2020 10:35
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > > > >
> > > > > > > Please share the query which you are creating.
> > > > > > >
> > > > > > > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard
> > > > > > > <claire.poll...@imagen.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > In Solr 8.3.0 I've got an edismax query parser in my search
> > > > > > > > handler, and it seems to be ignoring Boolean operators such
> as
> > > > > > > > AND and OR when searching using an IntPointField.
> > > > > > > >
> > > > > > > > I was hoping to use a query to this field to return a batch
> of
> > > > > > > > documents with non-sequential IDs, so a range would be
> > > inappropriate.
> > > > > > > >
> > > > > > > > We had a previous 4.10.2 instance of Solr which uses the now
> > > > > > > > deprecated Trie fields, and these seem to search without
> issue
> > > > > > > > using
> > > > > > > boolean operators.
> > > > > > > >
> > > > > > > > Is there something extra I need to do with my setup for
> > > > > > > > PointFields to use booleans or should they work as default.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Claire.
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>

Re: Edismax ignoring queries containing booleans

Reply via email to