On Fri, Sep 1, 2017 at 9:17 AM, Ere Maijala <ere.maij...@helsinki.fi> wrote:
> I spoke a bit too soon. Now I see why I didn't see any improvement from
> facet.method=uif before: its performance seems to depend heavily on how many
> facets are returned. With an index of 6 million records and the facet having
> 1960 buckets:
>
> facet.limit=20 takes 4ms
> facet.limit=200 takes ~100ms
> facet.limit=2000 takes ~1300ms
>
> So, for some uses it provides a nice boost, but if you need to fetch more
> than a few top items, it doesn't perform properly.

Another thought on this one:
If it does slow down more than 4.x when requesting many items, it's either
1) a bug introduced at some point
2) not actually slower, but due to the 6.6 index having more segments
(ord->string conversion needs to merge multiple term enumerators, so
more segments == slower)

If you could check #2, that would be great!  If it doesn't seem to be
the problem, could you open up a new JIRA issue for this?

-Yonik


> Query used was:
>
> q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=2000&debugQuery=true&facet.method=uif
>
> --Ere
>
>
> Ere Maijala kirjoitti 1.9.2017 klo 13.10:
>>
>> I can confirm that we're seeing the same issue as Günter. For a collection
>> of 57 million bibliographic records, Solr 4.10.2 (without docValues) can
>> consistently return a facet in about 20ms, while Solr 6.6.0 with docValues
>> takes around 2600ms. I've tested some versions between those two too, but I
>> don't have comparable numbers for them.
>>
>> I thought I had tried all different combinations of docValues="true/false"
>> and facet.method=fc/uif/enum, but now that I checked it again, it seems that
>> I may have missed a test, as an 6.6.0 index with docValues="false" and
>> facet.method=uif is markedly faster than other methods. At around 700ms it's
>> still not nowhere near as fast as 4.10.2, but a whole lot better. It seems
>> that docValues needs to be disabled for facet.method=uif to have effect
>> though, which is unfortunate. Otherwise it reports that applied method is
>> UIF, but the performance is actually much worse than with FC. I'll do just
>> another round of testing to verify all this. I can report to SOLR-8096 when
>> I have something conclusive.
>>
>> --Ere
>>
>> Yonik Seeley kirjoitti 31.8.2017 klo 20.04:
>>>
>>> A possible improvement for some multiValued fields might be to use the
>>> "uif" facet method (UnInvertedField was the default method for
>>> multiValued fields in 4.x)
>>> I'm not sure if you would need to reindex without docValues on that
>>> field to try it though.
>>>
>>> Example: to enable on the "union" field, add f.union.facet.method=uif
>>>
>>> Support for this was added in
>>> https://issues.apache.org/jira/browse/SOLR-8466
>>>
>>> -Yonik
>>>
>>>
>>> On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
>>> <guenter.hip...@unibas.ch> wrote:
>>>>
>>>> Hi,
>>>>
>>>> in the meantime I came across the reason for the slow facet processing
>>>> capacities of SOLR since version 5.x
>>>>
>>>>   https://issues.apache.org/jira/browse/SOLR-8096
>>>> https://issues.apache.org/jira/browse/LUCENE-5666
>>>>
>>>> compared to version 4.x
>>>>
>>>> Various library networks across the world are suffering from the same
>>>> symptoms:
>>>>
>>>> Facet processing is one of the most important features of a search
>>>> server
>>>> (for us) and it seems (at least IMHO) there is no solution for the issue
>>>> since March 2015 (release date for the last SOLR 4 version)
>>>>
>>>> What are the plans / ideas of the solr developers for a possible future
>>>> solution? Or maybe there is already a solution I haven't seen so far.
>>>>
>>>> Thanks for a feedback
>>>>
>>>> Günter
>>>>
>>>>
>>>>
>>>> On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I can't figure out the reason why the facet processing in version 6
>>>>> needs
>>>>> significantly more time compared to version 4.
>>>>>
>>>>> The debugging response (for 30 million documents)
>>>>>
>>>>> solr 4
>>>>> <lst name="process"><double name="time">280.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">280.0</double></lst>
>>>>> (once the query is cached)
>>>>> before caching: between 1.5 and 2 sec
>>>>>
>>>>>
>>>>> solr 6.x (my last try was with 6.6)
>>>>> without docvalues for facetting fields (same schema as version 4)
>>>>> <lst name="process"><double name="time">5874.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">5873.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>> the time is not getting better even after repeating the query several
>>>>> times
>>>>>
>>>>>
>>>>> solr 6.6 with docvalues for facetting fields
>>>>> <lst name="process"><double name="time">9837.0</double><lst
>>>>> name="query"><double name="time">0.0</double></lst><lst
>>>>> name="facet"><double
>>>>> name="time">9837.0</double></lst><lst name="facet_module"><double
>>>>> name="time">0.0</double></lst>
>>>>>
>>>>> used query (our productive system with version 4)
>>>>>
>>>>>
>>>>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>>>>
>>>>>
>>>>> Running the queries on smaller indices (8 million docs) the difference
>>>>> is
>>>>> similar although the absolut figures for processing time are smaller
>>>>>
>>>>>
>>>>> Any hints why this huge differences?
>>>>>
>>>>> Günter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Universität Basel
>>>> Universitätsbibliothek
>>>> Günter Hipler
>>>> Projekt SwissBib
>>>> Schoenbeinstrasse 18-20
>>>> 4056 Basel, Schweiz
>>>> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
>>>> E-Mail guenter.hip...@unibas.ch
>>>> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>>>>
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Reply via email to