bq: But if we are keeping the indexed=true, then docValues=true will STILL use at least as much memory however efficient docValues are themselves, right?
AFAIK, kinda. The big difference is that with docValues="false", you're building these structures in the JVM whereas with docValues="true", the structures are at least partially in the OS memory thus relieving the pressure on Java's heap, GC and the rest. On Mon, Nov 9, 2015 at 9:06 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Thank you Yonik. > > So I would probably advise then to "keep your indexed=true" and think > about _adding_ docValues when there is a memory pressure or when there > is clear performance issue for the ...specific... uses. > > But if we are keeping the indexed=true, then docValues=true will STILL > use at least as much memory however efficient docValues are > themselves, right? Or will something that is normally loaded and use > memory will stay unloaded in this combination scenario? > > Regards, > Alex. > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 9 November 2015 at 11:57, Yonik Seeley <ysee...@gmail.com> wrote: >> On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch >> <arafa...@gmail.com> wrote: >>> I thought docValues were per segment, so the price of un-inversion was >>> effectively paid on each commit for all the segments, as opposed to >>> just the updated one. >> >> Both the field cache (i.e. uninverting indexed values) and docValues >> are mostly per-segment (I say mostly because some uses still require >> building a global ord map). >> >> But even when things are mostly per-segment, you hit major segment >> merges and the cost of un-inversion (when you aren't using docValues) >> is non-trivial. >> >>> I admit I also find the story around docValues to be very confusing at >>> the moment. Especially on the interplay with "indexed=false". >> >> You still need "indexed=true" for efficient filters on the field. >> Hence if you're faceting on a field and want to use docValues, you >> probably want to keep the "indexed=true" on the field as well. >> >> -Yonik >> >> >>> It would >>> make a VERY good article to have this clarified somehow by people in >>> the know. >>> >>> Regards, >>> Alex. >>> ---- >>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >>> http://www.solr-start.com/ >>> >>> >>> On 9 November 2015 at 11:04, Yonik Seeley <ysee...@gmail.com> wrote: >>>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz <demian.k...@villanova.edu> >>>> wrote: >>>>> I understand that by adding "docValues=true" to some of my fields, I can >>>>> improve sorting/faceting performance. >>>> >>>> I don't think this is true in the general sense. >>>> docValues are built at index-time, so what you will save is initial >>>> un-inversion time (i.e. the first time a field is used after a new >>>> searcher is opened). >>>> After that point, docValues may be slightly slower. >>>> >>>> The other advantage of docValues is memory use... much/most of it is >>>> essentially "off-heap", being memory-mapped from disk. This cuts down >>>> on memory issues and helps reduce longer GC pauses. >>>> >>>> docValues are good in general, and I think we should default to them >>>> more for Solr 6, but they are not better in all ways. >>>> >>>>> However, I have a couple of questions: >>>>> >>>>> >>>>> 1.) Will Solr always take proper advantage of docValues when it is >>>>> turned on >>>> >>>> Yes. >>>> >>>>> , or will I gain greater performance by turning of stored/indexed in >>>>> situations where only docValues are necessary (e.g. a sort-only field)? >>>>> >>>>> 2.) Will adding docValues to a field introduce significant performance >>>>> penalties for non-docValues uses of that field, beyond the obvious fact >>>>> that the additional data will consume more disk and memory? >>>> >>>> No, it's a separate part of the index. >>>> >>>> -Yonik >>>> >>>> >>>>> I'm asking this question because the existing schema has some >>>>> multi-purpose fields, and I'm trying to determine whether I should just >>>>> add "docValues=true" wherever it might help, or if I need to take a more >>>>> thoughtful approach and potentially split some fields with copyFields, >>>>> etc. This is particularly significant because my schema makes use of some >>>>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to >>>>> differentiate docValues/non-docValues fields, or if it's okay to turn on >>>>> docValues across the board "just in case." >>>>> >>>>> Apologies if these questions have already been answered - I couldn't find >>>>> a totally clear answer in the places I searched. >>>>> >>>>> Thanks! >>>>> >>>>> - Demian