I can confirm that we're seeing the same issue as Günter. For a collection of 57 million bibliographic records, Solr 4.10.2 (without docValues) can consistently return a facet in about 20ms, while Solr 6.6.0 with docValues takes around 2600ms. I've tested some versions between those two too, but I don't have comparable numbers for them.

I thought I had tried all different combinations of docValues="true/false" and facet.method=fc/uif/enum, but now that I checked it again, it seems that I may have missed a test, as an 6.6.0 index with docValues="false" and facet.method=uif is markedly faster than other methods. At around 700ms it's still not nowhere near as fast as 4.10.2, but a whole lot better. It seems that docValues needs to be disabled for facet.method=uif to have effect though, which is unfortunate. Otherwise it reports that applied method is UIF, but the performance is actually much worse than with FC. I'll do just another round of testing to verify all this. I can report to SOLR-8096 when I have something conclusive.

--Ere

Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.

Example: to enable on the "union" field, add f.union.facet.method=uif

Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466

-Yonik


On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
<guenter.hip...@unibas.ch> wrote:
Hi,

in the meantime I came across the reason for the slow facet processing
capacities of SOLR since version 5.x

  https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666

compared to version 4.x

Various library networks across the world are suffering from the same
symptoms:

Facet processing is one of the most important features of a search server
(for us) and it seems (at least IMHO) there is no solution for the issue
since March 2015 (release date for the last SOLR 4 version)

What are the plans / ideas of the solr developers for a possible future
solution? Or maybe there is already a solution I haven't seen so far.

Thanks for a feedback

Günter



On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:

Hi,

I can't figure out the reason why the facet processing in version 6 needs
significantly more time compared to version 4.

The debugging response (for 30 million documents)

solr 4
<lst name="process"><double name="time">280.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec


solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">5873.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
the time is not getting better even after repeating the query several
times


solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">9837.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>

used query (our productive system with version 4)

http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count


Running the queries on smaller indices (8 million docs) the difference is
similar although the absolut figures for processing time are smaller


Any hints why this huge differences?

Günter










--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/


--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Reply via email to