On 10/17/2017 2:09 AM, Julian Ohrt wrote:
The Solr 6.6 documentation states:

In cases where the query is returning only docValues fields performance may 
improve since returning stored fields requires disk reads and decompression 
whereas returning docValues fields in the fl list only requires memory access.

I'm curious how this guarantee (that docValues are accessed from memory not disk) could possibly exist.  I think the only way that this could be guaranteed is for Lucene to keep docValues data in the heap, but using docValues is supposed to *reduce* heap requirements, not increase them, so I don't think that's going to happen.  If the data's not in the heap, then you're reliant on the OS disk cache as to whether or not the data is in memory, and that would be the case either way.  Do I have an incorrect understanding of how this works?

As I understand it, the potential advantage to docValues over stored data is two-fold:  1) docValues are accessed differently because all the values for one field across the entire Lucene segment are in one place.  This can be a good thing or a bad thing depending on the query and the data characteristics, and it may not be obvious which way that will go.  2) docValues data is not compressed, so there's less CPU required.  In cases where OS disk caching is insufficient and the compression ratio is really good, stored data might actually be faster.

Thanks,
Shawn

Reply via email to