Shawn’s right. You have a mixed index, some segments have docValues and some don’t. So yes, you do need to reindex everything before drawing conclusions. To make matters worse, when you start indexing documents new segments with docvalues will eventually be merged with segments that don’t have docValues, leading to significant inconsistencies.
As with all sorting, you can tell nothing from one test. The first time a field is accessed for sorting it must be read from disk in either case (docValues true or false). The difference is that with docValues=true, the “uninverted” structure must be built from the indexed values on the Java heap. In the docValues=true case, it’s just un-serialized from disk into the OS memory. Point is that after you’ve completely re-indexed everything (and I would, indeed, use a new collection) the first time you use the field it’ll take extra time. You can’t draw any valid conclusions until you average over quite a number of queries or throw out the first few times. Best, Erick > On May 20, 2019, at 8:30 AM, Shawn Heisey <[email protected]> wrote: > > On 5/20/2019 8:59 AM, Ashwin Ramesh wrote: >> Hi Shawn, >> Thanks for the prompt response. >> 1. date type def - <fieldType name="date" class="solr.DatePointField" >> positionIncrementGap="0" /> >> 2. The field is brand new. I added it to schema.xml, uploaded to ZK & >> reloaded the collection. After that we started indexing the few thousand. >> Did we still need to do a full reindex to a fresh collection? >> 3. It is the only difference. I am testing the raw URL call timing >> difference with and without the extra sort. > > As I understand it, the docValues data will not be correct for the existing > documents if they are not all reindexed. If I am wrong, I am sure somebody > will correct me. Although I would not expect that to make things slow, the > internal Lucene details are not something I have a lot of insight into. > > Thanks, > Shawn
