Re: Costs/benefits of DocValues

Erick Erickson Mon, 09 Nov 2015 09:21:07 -0800

bq: But if we are keeping the indexed=true, then docValues=true will STILL
use at least as much memory however efficient docValues are
themselves, right?


AFAIK, kinda. The big difference is that with docValues="false", you're
building these structures in the JVM whereas with docValues="true",
the structures are at least partially in the OS memory thus relieving
the pressure on Java's heap, GC and the rest.

On Mon, Nov 9, 2015 at 9:06 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Thank you Yonik.
>
> So I would probably advise then to "keep your indexed=true" and think
> about _adding_ docValues when there is a memory pressure or when there
> is clear performance issue for the ...specific... uses.
>
> But if we are keeping the indexed=true, then docValues=true will STILL
> use at least as much memory however efficient docValues are
> themselves, right? Or will something that is normally loaded and use
> memory will stay unloaded in this combination scenario?
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 November 2015 at 11:57, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
>> <arafa...@gmail.com> wrote:
>>> I thought docValues were per segment, so the price of un-inversion was
>>> effectively paid on each commit for all the segments, as opposed to
>>> just the updated one.
>>
>> Both the field cache (i.e. uninverting indexed values) and docValues
>> are mostly per-segment (I say mostly because some uses still require
>> building a global ord map).
>>
>> But even when things are mostly per-segment, you hit major segment
>> merges and the cost of un-inversion (when you aren't using docValues)
>> is non-trivial.
>>
>>> I admit I also find the story around docValues to be very confusing at
>>> the moment. Especially on the interplay with "indexed=false".
>>
>> You still need "indexed=true" for efficient filters on the field.
>> Hence if you're faceting on a field and want to use docValues, you
>> probably want to keep the "indexed=true" on the field as well.
>>
>> -Yonik
>>
>>
>>> It would
>>> make a VERY good article to have this clarified somehow by people in
>>> the know.
>>>
>>> Regards,
>>>    Alex.
>>> ----
>>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 9 November 2015 at 11:04, Yonik Seeley <ysee...@gmail.com> wrote:
>>>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz <demian.k...@villanova.edu> 
>>>> wrote:
>>>>> I understand that by adding "docValues=true" to some of my fields, I can 
>>>>> improve sorting/faceting performance.
>>>>
>>>> I don't think this is true in the general sense.
>>>> docValues are built at index-time, so what you will save is initial
>>>> un-inversion time (i.e. the first time a field is used after a new
>>>> searcher is opened).
>>>> After that point, docValues may be slightly slower.
>>>>
>>>> The other advantage of docValues is memory use... much/most of it is
>>>> essentially "off-heap", being memory-mapped from disk.  This cuts down
>>>> on memory issues and helps reduce longer GC pauses.
>>>>
>>>> docValues are good in general, and I think we should default to them
>>>> more for Solr 6, but they are not better in all ways.
>>>>
>>>>> However, I have a couple of questions:
>>>>>
>>>>>
>>>>> 1.)    Will Solr always take proper advantage of docValues when it is 
>>>>> turned on
>>>>
>>>> Yes.
>>>>
>>>>> , or will I gain greater performance by turning of stored/indexed in 
>>>>> situations where only docValues are necessary (e.g. a sort-only field)?
>>>>>
>>>>> 2.)    Will adding docValues to a field introduce significant performance 
>>>>> penalties for non-docValues uses of that field, beyond the obvious fact 
>>>>> that the additional data will consume more disk and memory?
>>>>
>>>> No, it's a separate part of the index.
>>>>
>>>> -Yonik
>>>>
>>>>
>>>>> I'm asking this question because the existing schema has some 
>>>>> multi-purpose fields, and I'm trying to determine whether I should just 
>>>>> add "docValues=true" wherever it might help, or if I need to take a more 
>>>>> thoughtful approach and potentially split some fields with copyFields, 
>>>>> etc. This is particularly significant because my schema makes use of some 
>>>>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
>>>>> differentiate docValues/non-docValues fields, or if it's okay to turn on 
>>>>> docValues across the board "just in case."
>>>>>
>>>>> Apologies if these questions have already been answered - I couldn't find 
>>>>> a totally clear answer in the places I searched.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> - Demian

Re: Costs/benefits of DocValues

Reply via email to