I spoke a bit too soon. Now I see why I didn't see any improvement from
facet.method=uif before: its performance seems to depend heavily on how
many facets are returned. With an index of 6 million records and the
facet having 1960 buckets:
facet.limit=20 takes 4ms
facet.limit=200 takes ~100ms
facet.limit=2000 takes ~1300ms
So, for some uses it provides a nice boost, but if you need to fetch
more than a few top items, it doesn't perform properly.
Query used was:
q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif
--Ere
Ere Maijala kirjoitti 1.9.2017 klo 13.10:
I can confirm that we're seeing the same issue as Günter. For a
collection of 57 million bibliographic records, Solr 4.10.2 (without
docValues) can consistently return a facet in about 20ms, while Solr
6.6.0 with docValues takes around 2600ms. I've tested some versions
between those two too, but I don't have comparable numbers for them.
I thought I had tried all different combinations of
docValues="true/false" and facet.method=fc/uif/enum, but now that I
checked it again, it seems that I may have missed a test, as an 6.6.0
index with docValues="false" and facet.method=uif is markedly faster
than other methods. At around 700ms it's still not nowhere near as fast
as 4.10.2, but a whole lot better. It seems that docValues needs to be
disabled for facet.method=uif to have effect though, which is
unfortunate. Otherwise it reports that applied method is UIF, but the
performance is actually much worse than with FC. I'll do just another
round of testing to verify all this. I can report to SOLR-8096 when I
have something conclusive.
--Ere
Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.
Example: to enable on the "union" field, add f.union.facet.method=uif
Support for this was added in
https://issues.apache.org/jira/browse/SOLR-8466
-Yonik
On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
<guenter.hip...@unibas.ch> wrote:
Hi,
in the meantime I came across the reason for the slow facet processing
capacities of SOLR since version 5.x
https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666
compared to version 4.x
Various library networks across the world are suffering from the same
symptoms:
Facet processing is one of the most important features of a search
server
(for us) and it seems (at least IMHO) there is no solution for the issue
since March 2015 (release date for the last SOLR 4 version)
What are the plans / ideas of the solr developers for a possible future
solution? Or maybe there is already a solution I haven't seen so far.
Thanks for a feedback
Günter
On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
Hi,
I can't figure out the reason why the facet processing in version 6
needs
significantly more time compared to version 4.
The debugging response (for 30 million documents)
solr 4
<lst name="process"><double name="time">280.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double
name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec
solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double
name="time">5873.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
the time is not getting better even after repeating the query several
times
solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double
name="time">9837.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
Running the queries on smaller indices (8 million docs) the
difference is
similar although the absolut figures for processing time are smaller
Any hints why this huge differences?
Günter
--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org / http://www.ub.unibas.ch/
--
Ere Maijala
Kansalliskirjasto / The National Library of Finland