Did you try it with 'sow' parameter both ways? I am not sure I fully understand the question, especially with shingling on both passes rather than just indexing one. But at least it is something to try and is one of the difference areas between Solr and ES.
Regards, Alex. On Tue, 19 May 2020 at 05:59, Radu Gheorghe <radu.gheor...@sematext.com> wrote: > > Hello Solr users, > > I’m quite puzzled about how shingles work. The way tokens are analysed looks > fine to me, but the query seems too restrictive. > > Here’s the sample use-case. I have three documents: > > mona lisa smile > mona lisa > mona > > I have a shingle filter set up like this (both index- and query-time): > > > <filter class="solr.ShingleFilterFactory" minShingleSize="2" > > maxShingleSize=“4”/> > > When I query for “Mona Lisa smile” (no quotes), I expect to get all three > documents back, in that order. Because the first document matches all the > terms: > > mona > mona lisa > mona lisa smile > lisa > lisa smile > smile > > And the second one matches only some, and the third document only matches one. > > Instead, I only get the first document back. That’s because the query expects > all the “words” to match: > > > "parsedquery":"+DisjunctionMaxQuery((((+shingle_field:mona > > +usage_query_view_tags:lisa +shingle_field:smile) (+shingle_field:mona > > +shingle_field:lisa smile) (+shingle_field:mona lisa +shingle_field:smile) > > shingle_field:mona lisa smile)))”, > > The query above is generated by the Edismax query parser, when I’m using > “shingle_field” as “df”. > > Is there a way to get “any of the words” to match? I’ve tried all the options > I can think of: > - different query parsers > - q.OP=OR > - mm=0 (or 1 or 0% or 10% or…) > > Nothing seems to change the parsed query from the above. > > I’ve compared this to the behaviour of Elasticsearch. There, I get “OR” by > default, and minimum_should_match works as expected. The only difference I > see between the two, on the analysis side, is that tokens start at 0 in > Elasticsearch and at 1 in Solr. I doubt that’s the problem, because I see > that the default “text_en”, for example, also starts at position 1. > > Is it just a bug that mm doesn’t work in the context of shingles? Or is there > a workaround? > > Thanks and best regards, > Radu