Re: Howto verify that only docValues are returned

Shawn Heisey Tue, 17 Oct 2017 10:01:40 -0700

On 10/17/2017 2:09 AM, Julian Ohrt wrote:

The Solr 6.6 documentation states:


In cases where the query is returning only docValues fields performance may 
improve since returning stored fields requires disk reads and decompression 
whereas returning docValues fields in the fl list only requires memory access.

I'm curious how this guarantee (that docValues are accessed from memorynot disk) could possibly exist. I think the only way that this could beguaranteed is for Lucene to keep docValues data in the heap, but usingdocValues is supposed to *reduce* heap requirements, not increase them,so I don't think that's going to happen. If the data's not in the heap,then you're reliant on the OS disk cache as to whether or not the datais in memory, and that would be the case either way. Do I have anincorrect understanding of how this works?

As I understand it, the potential advantage to docValues over storeddata is two-fold: 1) docValues are accessed differently because all thevalues for one field across the entire Lucene segment are in one place. This can be a good thing or a bad thing depending on the query and thedata characteristics, and it may not be obvious which way that will go. 2) docValues data is not compressed, so there's less CPU required. Incases where OS disk caching is insufficient and the compression ratio isreally good, stored data might actually be faster.


Thanks,
Shawn

Re: Howto verify that only docValues are returned

Reply via email to