Re: when to use docvalue

Revas Wed, 20 May 2020 14:54:20 -0700

Thanks, Erick. Its just when we enable both index=true and docValues=true,
it increases the index time by 2x atleast for full re-index.


On Wed, May 20, 2020 at 2:30 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Revas:
>
> Facet queries are just queries that are constrained by the total result
> set of your
> primary query, so the answer to that would be the same as speeding up
> regular
> queries. As far as range facets are concerned, I believe they _do_ use
> docValues,
> after all they have to answer the exact same question: For doc X in the
> result set,
> what is the value of field Y? The only difference is it has to bucket a
> bunch of them.
>
> Rahul: Please don;’t hijack threads, it makes it difficult to find things
> later. Start
> a separate e-mail thread.
>
> The answer to your question is, of course, “it depends” on a number of
> things and
> changes with the query. First of all, multivalued fields don’t qualify
> because
> docValues are a sorted set, meaning the return is sorted and deduplicated.
> So if
> the input has f values in it, b c d c d, what you’d get back from DV is b
> c d.
>
> So let’s go with primitive, single-valued types. It still depends, but
> Solr does
> the right thing, or tries. Here’s the scoop. stored fields for any single
> doc are
> stored as a contiguous, compressed bit of memory. So if any _one_ field
> needs
> to be read from the stored data, the entire block is decompressed and Solr
> will
> preferentially fetch the value from the decompressed data as it’s pretty
> certain
> to be at least as cheap as fetching from DV. However, the reverse is true
> if _all_
> the returned values are single-valued DV fields. Then it’s more efficient
> to fetch
> the DV values as they’re MMapped, and won’t cost the seek-and-decompress
> cycle.
>
> Unless space is a real consideration for you, I’d set both index and
> docValues to
> true…
>
> Best,
> Erick
>
> > On May 20, 2020, at 10:45 AM, Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >
> > Eric,
> > Thanks for that explanation. I have a follow up question on that. I find
> > the scenario of stored=true and docValues=true to be tricky at times...
> > would like to know when is each of these scenarios preferred over the
> other
> > two for primitive datatypes:
> >
> > 1) stored=true and docValues=false
> > 2) stored=false and docValues=true
> > 3) stored=true and docValues=true
> >
> > Thanks,
> > Rahul
> >
> > On Tue, May 19, 2020 at 5:55 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> They are _absolutely_ able to be used together. Background:
> >>
> >> “In the bad old days”, there was no docValues. So whenever you needed
> >> to facet/sort/group/use function queries Solr (well, Lucene) had to take
> >> the inverted structure resulting from “index=true” and “uninvert” it on
> the
> >> Java heap.
> >>
> >> docValues essentially does the “uninverting” at index time and puts
> >> that structure in a separate file for each segment. So rather than
> uninvert
> >> the index on the heap, Lucene can just read it in from disk in
> >> MMapDirectory
> >> (i.e. OS) memory space.
> >>
> >> The downside is that your index will be bigger when you do both, that is
> >> the
> >> size on disk will be bigger. But, it’ll be much faster to load, much
> >> faster to
> >> autowarm, and will move the structures necessary to do
> faceting/sorting/etc
> >> into OS memory where the garbage collection is vastly more efficient
> than
> >> Javas.
> >>
> >> And frankly I don’t think the increased size on disk is a downside.
> You’ll
> >> have
> >> to have the memory anyway, and having it used on the OS memory space is
> >> so much more efficient than on Java’s heap that it’s a win-win IMO.
> >>
> >> Oh, and if you never sort/facet/group/use function queries, then the
> >> docValues structures are never even read into MMapDirectory space.
> >>
> >> So yes, freely do both.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>> On May 19, 2020, at 5:41 PM, matthew sporleder <msporle...@gmail.com>
> >> wrote:
> >>>
> >>> You can index AND docvalue?  For some reason I thought they were
> >> exclusive
> >>>
> >>> On Tue, May 19, 2020 at 5:36 PM Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>>>
> >>>> Yes. You should also index them….
> >>>>
> >>>> Here’s the way I think of it.
> >>>>
> >>>> For questions “For term X, which docs contain that value?” means
> >> index=true. This is a search.
> >>>>
> >>>> For questions “Does doc X have value Y in field Z”, means
> >> docValues=true.
> >>>>
> >>>> what’s the difference? Well, the first one is to get the result set.
> >> The second is for, given a result set,
> >>>> count/sort/whatever.
> >>>>
> >>>> fq clauses are searches, so index=true.
> >>>>
> >>>> sorting, faceting, grouping and function queries  are “for each doc in
> >> the result set, what values does field Y contain?”
> >>>>
> >>>> Maybe that made things clear as mud, but it’s the way I think of it ;)
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>>
> >>>>
> >>>> fq clauses are searches. Indexed=true is for searching.
> >>>>
> >>>> sort
> >>>>
> >>>>> On May 19, 2020, at 4:00 PM, matthew sporleder <msporle...@gmail.com
> >
> >> wrote:
> >>>>>
> >>>>> I have quite a few numeric / meta-data type fields in my schema and
> >>>>> pretty much only use them in fq=, sort=, and friends.  Should I
> always
> >>>>> use DocValue on these if i never plan to q=search: on them?  Are
> there
> >>>>> any drawbacks?
> >>>>>
> >>>>> Thanks,
> >>>>> Matt
> >>>>
> >>
> >>
>
>

Re: when to use docvalue

Reply via email to