On Fri, Sep 1, 2017 at 9:17 AM, Ere Maijala <ere.maij...@helsinki.fi> wrote: > I spoke a bit too soon. Now I see why I didn't see any improvement from > facet.method=uif before: its performance seems to depend heavily on how many > facets are returned. With an index of 6 million records and the facet having > 1960 buckets: > > facet.limit=20 takes 4ms > facet.limit=200 takes ~100ms > facet.limit=2000 takes ~1300ms > > So, for some uses it provides a nice boost, but if you need to fetch more > than a few top items, it doesn't perform properly.
Another thought on this one: If it does slow down more than 4.x when requesting many items, it's either 1) a bug introduced at some point 2) not actually slower, but due to the 6.6 index having more segments (ord->string conversion needs to merge multiple term enumerators, so more segments == slower) If you could check #2, that would be great! If it doesn't seem to be the problem, could you open up a new JIRA issue for this? -Yonik > Query used was: > > q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif > > --Ere > > > Ere Maijala kirjoitti 1.9.2017 klo 13.10: >> >> I can confirm that we're seeing the same issue as Günter. For a collection >> of 57 million bibliographic records, Solr 4.10.2 (without docValues) can >> consistently return a facet in about 20ms, while Solr 6.6.0 with docValues >> takes around 2600ms. I've tested some versions between those two too, but I >> don't have comparable numbers for them. >> >> I thought I had tried all different combinations of docValues="true/false" >> and facet.method=fc/uif/enum, but now that I checked it again, it seems that >> I may have missed a test, as an 6.6.0 index with docValues="false" and >> facet.method=uif is markedly faster than other methods. At around 700ms it's >> still not nowhere near as fast as 4.10.2, but a whole lot better. It seems >> that docValues needs to be disabled for facet.method=uif to have effect >> though, which is unfortunate. Otherwise it reports that applied method is >> UIF, but the performance is actually much worse than with FC. I'll do just >> another round of testing to verify all this. I can report to SOLR-8096 when >> I have something conclusive. >> >> --Ere >> >> Yonik Seeley kirjoitti 31.8.2017 klo 20.04: >>> >>> A possible improvement for some multiValued fields might be to use the >>> "uif" facet method (UnInvertedField was the default method for >>> multiValued fields in 4.x) >>> I'm not sure if you would need to reindex without docValues on that >>> field to try it though. >>> >>> Example: to enable on the "union" field, add f.union.facet.method=uif >>> >>> Support for this was added in >>> https://issues.apache.org/jira/browse/SOLR-8466 >>> >>> -Yonik >>> >>> >>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler >>> <guenter.hip...@unibas.ch> wrote: >>>> >>>> Hi, >>>> >>>> in the meantime I came across the reason for the slow facet processing >>>> capacities of SOLR since version 5.x >>>> >>>> https://issues.apache.org/jira/browse/SOLR-8096 >>>> https://issues.apache.org/jira/browse/LUCENE-5666 >>>> >>>> compared to version 4.x >>>> >>>> Various library networks across the world are suffering from the same >>>> symptoms: >>>> >>>> Facet processing is one of the most important features of a search >>>> server >>>> (for us) and it seems (at least IMHO) there is no solution for the issue >>>> since March 2015 (release date for the last SOLR 4 version) >>>> >>>> What are the plans / ideas of the solr developers for a possible future >>>> solution? Or maybe there is already a solution I haven't seen so far. >>>> >>>> Thanks for a feedback >>>> >>>> Günter >>>> >>>> >>>> >>>> On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I can't figure out the reason why the facet processing in version 6 >>>>> needs >>>>> significantly more time compared to version 4. >>>>> >>>>> The debugging response (for 30 million documents) >>>>> >>>>> solr 4 >>>>> <lst name="process"><double name="time">280.0</double><lst >>>>> name="query"><double name="time">0.0</double></lst><lst >>>>> name="facet"><double >>>>> name="time">280.0</double></lst> >>>>> (once the query is cached) >>>>> before caching: between 1.5 and 2 sec >>>>> >>>>> >>>>> solr 6.x (my last try was with 6.6) >>>>> without docvalues for facetting fields (same schema as version 4) >>>>> <lst name="process"><double name="time">5874.0</double><lst >>>>> name="query"><double name="time">0.0</double></lst><lst >>>>> name="facet"><double >>>>> name="time">5873.0</double></lst><lst name="facet_module"><double >>>>> name="time">0.0</double></lst> >>>>> the time is not getting better even after repeating the query several >>>>> times >>>>> >>>>> >>>>> solr 6.6 with docvalues for facetting fields >>>>> <lst name="process"><double name="time">9837.0</double><lst >>>>> name="query"><double name="time">0.0</double></lst><lst >>>>> name="facet"><double >>>>> name="time">9837.0</double></lst><lst name="facet_module"><double >>>>> name="time">0.0</double></lst> >>>>> >>>>> used query (our productive system with version 4) >>>>> >>>>> >>>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count >>>>> >>>>> >>>>> Running the queries on smaller indices (8 million docs) the difference >>>>> is >>>>> similar although the absolut figures for processing time are smaller >>>>> >>>>> >>>>> Any hints why this huge differences? >>>>> >>>>> Günter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Universität Basel >>>> Universitätsbibliothek >>>> Günter Hipler >>>> Projekt SwissBib >>>> Schoenbeinstrasse 18-20 >>>> 4056 Basel, Schweiz >>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 >>>> E-Mail guenter.hip...@unibas.ch >>>> URL: www.swissbib.org / http://www.ub.unibas.ch/ >>>> >> > > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland