Robert Muir created LUCENE-5750:
-----------------------------------

             Summary: Speed up monotonic address access in BINARY/SORTED_SET
                 Key: LUCENE-5750
                 URL: https://issues.apache.org/jira/browse/LUCENE-5750
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir
         Attachments: LUCENE-5750.patch

I found this while exploring LUCENE-5748, but it currently applies to both 
variable length BINARY and SORTED_SET, so I think its worth it to do here first.

I think its just a holdover from before MonotonicBlockPackedWriter that to 
access element N we currently do:
{code}
startOffset = (docID == 0 ? 0 : ordIndex.get(docID-1));
endOffset = ordIndex.get(docID);
{code}

Thats because previously we didnt have packed ints that supported > 
Integer.MAX_VALUE elements. But thats been fixed for a long time. If we just 
write a 0 first and do this:
{code}
startOffset = ordIndex.get(docID);
endOffset = ordIndex.get(docID+1);
{code}

The access is then much faster. For sorting i see around 20% improvement. We 
don't lose any compression because we should assume the delta from 0 .. 1 is 
similar to any other gap N .. N+1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to