Re: Sort on docValue field is slow.

Erick Erickson Mon, 20 May 2019 12:09:19 -0700

Shawn’s right. You have a mixed index, some segments have docValues and some 
don’t. So yes, you do need to reindex everything before drawing conclusions. To 
make matters worse, when you start indexing documents new segments with 
docvalues will eventually be merged with segments that don’t have docValues, 
leading to significant inconsistencies.

As with all sorting, you can tell nothing from one test. The first time a field 
is accessed for sorting it must be read from disk in either case (docValues 
true or false). The difference is that with docValues=true, the “uninverted” 
structure must be built from the indexed values on the Java heap. In the 
docValues=true case, it’s just un-serialized from disk into the OS memory.

Point is that after you’ve completely re-indexed everything (and I would, 
indeed, use a new collection) the first time you use the field it’ll take extra 
time. You can’t draw any valid conclusions until you average over quite a 
number of queries or throw out the first few times.

Best,
Erick

> On May 20, 2019, at 8:30 AM, Shawn Heisey <[email protected]> wrote:
> 
> On 5/20/2019 8:59 AM, Ashwin Ramesh wrote:
>> Hi Shawn,
>> Thanks for the prompt response.
>> 1. date type def - <fieldType name="date" class="solr.DatePointField"
>> positionIncrementGap="0" />
>> 2. The field is brand new. I added it to schema.xml, uploaded to ZK &
>> reloaded the collection. After that we started indexing the few thousand.
>> Did we still need to do a full reindex to a fresh collection?
>> 3. It is the only difference. I am testing the raw URL call timing
>> difference with and without the extra sort.
> 
> As I understand it, the docValues data will not be correct for the existing 
> documents if they are not all reindexed.  If I am wrong, I am sure somebody 
> will correct me.  Although I would not expect that to make things slow, the 
> internal Lucene details are not something I have a lot of insight into.
> 
> Thanks,
> Shawn

Re: Sort on docValue field is slow.

Reply via email to