Robert Muir created LUCENE-5750: ----------------------------------- Summary: Speed up monotonic address access in BINARY/SORTED_SET Key: LUCENE-5750 URL: https://issues.apache.org/jira/browse/LUCENE-5750 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5750.patch
I found this while exploring LUCENE-5748, but it currently applies to both variable length BINARY and SORTED_SET, so I think its worth it to do here first. I think its just a holdover from before MonotonicBlockPackedWriter that to access element N we currently do: {code} startOffset = (docID == 0 ? 0 : ordIndex.get(docID-1)); endOffset = ordIndex.get(docID); {code} Thats because previously we didnt have packed ints that supported > Integer.MAX_VALUE elements. But thats been fixed for a long time. If we just write a 0 first and do this: {code} startOffset = ordIndex.get(docID); endOffset = ordIndex.get(docID+1); {code} The access is then much faster. For sorting i see around 20% improvement. We don't lose any compression because we should assume the delta from 0 .. 1 is similar to any other gap N .. N+1 -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org