[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access
[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541777#comment-15541777 ] ASF subversion and git services commented on LUCENE-7457: - Commit 2f88bc80c2c1afed975199adb3f340fcec8179aa in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2f88bc8 ] LUCENE-7457: Make Lucene54DocValuesFormat's sparse case actually implement an iterator. > Default doc values format should optimize for iterator access > - > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access
[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516680#comment-15516680 ] Michael McCandless commented on LUCENE-7457: OK let's leave it at 1% for this issue? > Default doc values format should optimize for iterator access > - > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access
[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515646#comment-15515646 ] Adrien Grand commented on LUCENE-7457: -- Something to be aware of when increasing it is that in the case that values require few bits (eg. an enum or a boolean field), the doc ids can quickly start to use significant disk space and could make doc values use _more_ disk space than when they were densely encoded. > Default doc values format should optimize for iterator access > - > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access
[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515643#comment-15515643 ] Adrien Grand commented on LUCENE-7457: -- I don't mind increasing it to something like 10%. However I hope this will never be useful and we will write a DV format that better takes advantage of the iterator-style API before 7.0 is released? > Default doc values format should optimize for iterator access > - > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access
[ https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514530#comment-15514530 ] Michael McCandless commented on LUCENE-7457: Thanks [~jpountz], this looks great! Should we also increase the sparse threshold (currently 1%) when writing doc values? Or we can wait for a followon issue... > Default doc values format should optimize for iterator access > - > > Key: LUCENE-7457 > URL: https://issues.apache.org/jira/browse/LUCENE-7457 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Blocker > Fix For: master (7.0) > > Attachments: LUCENE-7457.patch > > > In LUCENE-7407 we switched doc values consumption from random access API to > an iterator API, but nothing was done there to improve the codec. We should > do that here. > At a bare minimum we should fix the existing very-sparse case to be a true > iterator, and not wrapped with the silly legacy wrappers. > I think we should also increase the threshold (currently 1%?) when we switch > from dense to sparse encoding. This should fix LUCENE-7253, making merging > of sparse doc values efficient ("pay for what you use"). > I'm sure there are many other things to explore to let codecs "take > advantage" of the fact that they no longer need to offer random access to doc > values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org