[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-7457: --------------------------------- Attachment: LUCENE-7457.patch Here is a patch implementing what Mike describes above as the bare minimum. I'm not sure it is worth spending too much time on this since we will probably want to build a new DV format that better takes advantage of the iterator-style API until 7.0 is released? > Default doc values format should optimize for iterator access > ------------------------------------------------------------- > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Adrien Grand > Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org