It looks like
"debug":{ ... "parsedquery":"name:some", "parsedquery_toString":"name:some",
..
"QParser":"LuceneQParser", "filter_queries":["features:here"], "
parsed_filter_queries":["features:here"],On Wed, Jan 4, 2023 at 3:41 PM Noah Torp-Smith <[email protected]> wrote: > Thanks again for your input, Mikhail. I need to look more into this > debugOutput and the way we use `!parent which`. > > Can you maybe elaborate on which part of the debug output I should look at > in order to say "how is it parsed"? Is that output documented somewhere > (other than the solr source code)? > > Best regards, > > /Noah > > > -- > > Noah Torp-Smith ([email protected]) > > ________________________________ > Fra: Mikhail Khludnev <[email protected]> > Sendt: 3. januar 2023 19:29 > Til: [email protected] <[email protected]> > Emne: Re: Slowness when searching in child documents. > > Hold on. > > > I remove the first part of the filter (the one with parent which), > Noah, what's the performance of the child subquery alone? > q=pid.material_type:(\"lydbog\" \"artikel\") > What's qtime and how is it parsed? > > On Tue, Jan 3, 2023 at 5:55 PM Noah Torp-Smith <[email protected]> > wrote: > > > Thanks for the response. Here is a more hands-on example with measures > > that maybe illustrates better: > > > > We are on solr 9.0.1 > > > > We send this to solr (it's an equals sign, not a colon after parent > which, > > sorry for the confusion on my part): > > > > { > > "query": "flotte huse", > > "filter": [ > > "{!parent which='doc_type:work'}(pid.material_type:(\"lydbog\" > > \"artikel\"))", > > "doc_type:work" > > ], > > "fields": "work.workid", > > "offset": 0, > > "limit": 10, > > "params": { > > "defType": "edismax", > > "qf": [ > > "work.creator^100", > > "work.creator_fuzzy^0.001", > > "work.series^75", > > "work.subject_bibdk", > > "work.subject_fuzzy^0.001", > > "work.title^100", > > "work.title_fuzzy^0.001" > > ], > > "pf": [ > > "work.creator^200", > > "work.fictive_character", > > "work.series^175", > > "work.title^1000" > > ], > > "pf2": [ > > "work.creator^200", > > "work.fictive_character", > > "work.series^175", > > "work.title^1000" > > ], > > "pf3": [ > > "work.creator^200", > > "work.fictive_character", > > "work.series^175", > > "work.title^1000" > > ], > > "mm": "2<80%", > > "mm.autoRelax": "true", > > "ps": 5, > > "ps2": 5, > > "ps3": 5 > > } > > } > > > > > > This fetches 21 workids and it takes more than 20 seconds. If I remove > the > > first part of the filter (the one with parent which), it fetches 33 > workids > > in less than 200 miliseconds. I does not matter if I do it with or > without > > the filtering to material types first (as long as I come up with new > > examples so the filter cache is not being used). > > > > So it does not seem to depend on the number of returned documents. > > > > Thanks again for your help, it is much appreciated. > > > > > > -- > > > > Noah Torp-Smith ([email protected]) > > > > ________________________________ > > Fra: Mikhail Khludnev <[email protected]> > > Sendt: 3. januar 2023 14:09 > > Til: [email protected] <[email protected]> > > Emne: Re: Slowness when searching in child documents. > > > > Hello, Noah. > > > > A few notes: Query time depends on the number of results. When one query > is > > slower than another, we can find an excuse in a bigger number of > enumerated > > docs. > > Examine how the query is parsed in debugQuery output. There are many > tricks > > and pitfalls in query parsers. eg I'm not sure why you put colon after > > which, whether you put it so into Solr and how it interprets it. > > Which version of Solr/Lucene are you running? Some time ago Lucene had no > > two phase iteration, and was prone to redundant enumerations. > > > > > if there is some way to evaluate the search at the work level first, > and > > then do the filtering for those works that have manifestations matching > the > > child requirements afterwards? > > That's how it's expected to work. You can confirm your hypothesis by > > intersecting {!parent ..}.. with work_id:123 whether via fq or +. It > should > > turn around in a moment. > > > > So, if everything is right you might run just too large indices and have > to > > break it into many shards. > > > > > > On Tue, Jan 3, 2023 at 1:12 PM Noah Torp-Smith <[email protected]> > > wrote: > > > > > We are facing a performance issuw when searching in child documents. In > > > order to explain the issue, I will provide a very simplified excerpt of > > our > > > data model. > > > > > > We are making a search engine for libraries. What we want to deliver to > > > the users are "works". An example of a work could be Harry Potter and > the > > > Goblet of fire. Each work can have several manifestations; for example > > > there is a book version of the work, an audiobook, and maybe an e-book. > > Of > > > course, there are properties at the work level (like creator, title, > > > subjects, etc) and other properties at the manifestation level (like > > > publication year, material type, etc). > > > > > > We have modelled this with parent documents and child documents in > solr, > > > and have built a search engine on it. The search engine can search for > > > things like creators, titles, and subjects at the work level, but users > > > should also be allowed to search for things from a specific year or be > > able > > > to specify that the are only interested in things that are available as > > > e-books. > > > > > > We have around 28 million works in the solr and 41 million > > manifestations, > > > indexed as child documents (so many works have only one manifestation). > > > > > > As long as as the user searches for things at the work level, the > > > performance is fine. But as you can imagine, when users search for > things > > > at the manifestation level, the performance worsens. As an example, if > we > > > make a search for a creator, the search executes in less than 200 ms > and > > > results in maybe 30 hits. If we add a clause for a material type (with > a > > > `{!parent which:'doc_type:work'}materialType:"book"` construction), the > > > search takes several seconds. In this case we want the filtering to > books > > > to be part of the ranking, so putting it in a filter query will pose a > > > problem. > > > > > > I am wondering if there is some way to evaluate the search at the work > > > level first, and then do the filtering for those works that have > > > manifestations matching the child requirements afterwards? I could try > to > > > do the search for work-level properties first and only fetch IDs and > then > > > do the full search with the manifestation-level requirements afterwards > > and > > > an added filter query with the IDs, but I am wondering if there is a > > better > > > way to do this. > > > > > > I have also looked at denormalizing ( > > > > > > https://blog.innoventsolutions.com/innovent-solutions-blog/2018/05/avoid-the-parentchild-trap-tips-and-tricks-for-denormalizing-solr-data.html > > ) > > > and it helps when doing it for a few child fields. But as said, there > are > > > more properties in the real model than those I have mentioned here, so > > that > > > also involves some complications. > > > > > > Kind regards, > > > > > > /Noah > > > > > > > > > -- > > > > > > Noah Torp-Smith ([email protected]) > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > https://t.me/MUST_SEARCH > > A caveat: Cyrillic! > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!
