Nikolay Khitrin created LUCENE-8178: ---------------------------------------
Summary: Bulk operations for LongValues and Sorted[Set]DocValues Key: LUCENE-8178 URL: https://issues.apache.org/jira/browse/LUCENE-8178 Project: Lucene - Core Issue Type: Improvement Affects Versions: 7.2.1 Reporter: Nikolay Khitrin One-by-one DocValues iteration by {{advanceExact}} and {{nextOrd}}/{{ordValue}} is really slow for bulk operations like facetting. Reading and unpacking integers in blocks is substantially faster but DocValues for now can be queried only for single document. To apply document-based bulk processing {{DocIdSetIterator}} matches have to be splitted to sequential docID runs and remapped to underlying {{LongValues}} positions. After this transformation relatively large linear scans can be performed over packed integers. To do this two new interfaces 1. {{LongValuesCollector}} ({{collectValue(long index, long value)}}). 2. {{OrdStatsCollector}} ({{collectOrd(long ord)}}, {{collectMissing(int count)}}). and three new functions are introduced 1. {{LongValues.forRange(long begin, long end, LongValuesCollector collector)}} 2. {{SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}} 3. {{SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer collector)}} with reference implementations. Optimized versions of these functions are provided for: 1. {{DirectReader}} for non-32/64 bits per value cases (using {{PackedInts.Decoder}}). 2. {{Lucene70DocValuesProducer}} {{getSorted}} and {{getSortedSet}} (both sparse and dense). Measured Solr facetting performance boost is up to 2 - 2.5x on real index. Patch for Solr {{DocValuesFacets}} is also provided as separate file. Implementation notes: * {{OrdStatsCollector}} does not accept document id because it will ruin performance for {{SortedSetDocValues}} due to excessive position lookups. * This patch is fully compatible with Lucene 7.0 DocValues format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org