Thanks, Erick. Its just when we enable both index=true and docValues=true, it increases the index time by 2x atleast for full re-index.
On Wed, May 20, 2020 at 2:30 PM Erick Erickson <erickerick...@gmail.com> wrote: > Revas: > > Facet queries are just queries that are constrained by the total result > set of your > primary query, so the answer to that would be the same as speeding up > regular > queries. As far as range facets are concerned, I believe they _do_ use > docValues, > after all they have to answer the exact same question: For doc X in the > result set, > what is the value of field Y? The only difference is it has to bucket a > bunch of them. > > Rahul: Please don;’t hijack threads, it makes it difficult to find things > later. Start > a separate e-mail thread. > > The answer to your question is, of course, “it depends” on a number of > things and > changes with the query. First of all, multivalued fields don’t qualify > because > docValues are a sorted set, meaning the return is sorted and deduplicated. > So if > the input has f values in it, b c d c d, what you’d get back from DV is b > c d. > > So let’s go with primitive, single-valued types. It still depends, but > Solr does > the right thing, or tries. Here’s the scoop. stored fields for any single > doc are > stored as a contiguous, compressed bit of memory. So if any _one_ field > needs > to be read from the stored data, the entire block is decompressed and Solr > will > preferentially fetch the value from the decompressed data as it’s pretty > certain > to be at least as cheap as fetching from DV. However, the reverse is true > if _all_ > the returned values are single-valued DV fields. Then it’s more efficient > to fetch > the DV values as they’re MMapped, and won’t cost the seek-and-decompress > cycle. > > Unless space is a real consideration for you, I’d set both index and > docValues to > true… > > Best, > Erick > > > On May 20, 2020, at 10:45 AM, Rahul Goswami <rahul196...@gmail.com> > wrote: > > > > Eric, > > Thanks for that explanation. I have a follow up question on that. I find > > the scenario of stored=true and docValues=true to be tricky at times... > > would like to know when is each of these scenarios preferred over the > other > > two for primitive datatypes: > > > > 1) stored=true and docValues=false > > 2) stored=false and docValues=true > > 3) stored=true and docValues=true > > > > Thanks, > > Rahul > > > > On Tue, May 19, 2020 at 5:55 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> They are _absolutely_ able to be used together. Background: > >> > >> “In the bad old days”, there was no docValues. So whenever you needed > >> to facet/sort/group/use function queries Solr (well, Lucene) had to take > >> the inverted structure resulting from “index=true” and “uninvert” it on > the > >> Java heap. > >> > >> docValues essentially does the “uninverting” at index time and puts > >> that structure in a separate file for each segment. So rather than > uninvert > >> the index on the heap, Lucene can just read it in from disk in > >> MMapDirectory > >> (i.e. OS) memory space. > >> > >> The downside is that your index will be bigger when you do both, that is > >> the > >> size on disk will be bigger. But, it’ll be much faster to load, much > >> faster to > >> autowarm, and will move the structures necessary to do > faceting/sorting/etc > >> into OS memory where the garbage collection is vastly more efficient > than > >> Javas. > >> > >> And frankly I don’t think the increased size on disk is a downside. > You’ll > >> have > >> to have the memory anyway, and having it used on the OS memory space is > >> so much more efficient than on Java’s heap that it’s a win-win IMO. > >> > >> Oh, and if you never sort/facet/group/use function queries, then the > >> docValues structures are never even read into MMapDirectory space. > >> > >> So yes, freely do both. > >> > >> Best, > >> Erick > >> > >> > >>> On May 19, 2020, at 5:41 PM, matthew sporleder <msporle...@gmail.com> > >> wrote: > >>> > >>> You can index AND docvalue? For some reason I thought they were > >> exclusive > >>> > >>> On Tue, May 19, 2020 at 5:36 PM Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >>>> > >>>> Yes. You should also index them…. > >>>> > >>>> Here’s the way I think of it. > >>>> > >>>> For questions “For term X, which docs contain that value?” means > >> index=true. This is a search. > >>>> > >>>> For questions “Does doc X have value Y in field Z”, means > >> docValues=true. > >>>> > >>>> what’s the difference? Well, the first one is to get the result set. > >> The second is for, given a result set, > >>>> count/sort/whatever. > >>>> > >>>> fq clauses are searches, so index=true. > >>>> > >>>> sorting, faceting, grouping and function queries are “for each doc in > >> the result set, what values does field Y contain?” > >>>> > >>>> Maybe that made things clear as mud, but it’s the way I think of it ;) > >>>> > >>>> Best, > >>>> Erick > >>>> > >>>> > >>>> > >>>> fq clauses are searches. Indexed=true is for searching. > >>>> > >>>> sort > >>>> > >>>>> On May 19, 2020, at 4:00 PM, matthew sporleder <msporle...@gmail.com > > > >> wrote: > >>>>> > >>>>> I have quite a few numeric / meta-data type fields in my schema and > >>>>> pretty much only use them in fq=, sort=, and friends. Should I > always > >>>>> use DocValue on these if i never plan to q=search: on them? Are > there > >>>>> any drawbacks? > >>>>> > >>>>> Thanks, > >>>>> Matt > >>>> > >> > >> > >