[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access

2016-10-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541777#comment-15541777
 ] 

ASF subversion and git services commented on LUCENE-7457:
-

Commit 2f88bc80c2c1afed975199adb3f340fcec8179aa in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2f88bc8 ]

LUCENE-7457: Make Lucene54DocValuesFormat's sparse case actually implement an 
iterator.


> Default doc values format should optimize for iterator access
> -
>
> Key: LUCENE-7457
> URL: https://issues.apache.org/jira/browse/LUCENE-7457
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: LUCENE-7457.patch
>
>
> In LUCENE-7407 we switched doc values consumption from random access API to 
> an iterator API, but nothing was done there to improve the codec.  We should 
> do that here.
> At a bare minimum we should fix the existing very-sparse case to be a true 
> iterator, and not wrapped with the silly legacy wrappers.
> I think we should also increase the threshold (currently 1%?) when we switch 
> from dense to sparse encoding.  This should fix LUCENE-7253, making merging 
> of sparse doc values efficient ("pay for what you use").
> I'm sure there are many other things to explore to let codecs "take 
> advantage" of the fact that they no longer need to offer random access to doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access

2016-09-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516680#comment-15516680
 ] 

Michael McCandless commented on LUCENE-7457:


OK let's leave it at 1% for this issue?

> Default doc values format should optimize for iterator access
> -
>
> Key: LUCENE-7457
> URL: https://issues.apache.org/jira/browse/LUCENE-7457
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: LUCENE-7457.patch
>
>
> In LUCENE-7407 we switched doc values consumption from random access API to 
> an iterator API, but nothing was done there to improve the codec.  We should 
> do that here.
> At a bare minimum we should fix the existing very-sparse case to be a true 
> iterator, and not wrapped with the silly legacy wrappers.
> I think we should also increase the threshold (currently 1%?) when we switch 
> from dense to sparse encoding.  This should fix LUCENE-7253, making merging 
> of sparse doc values efficient ("pay for what you use").
> I'm sure there are many other things to explore to let codecs "take 
> advantage" of the fact that they no longer need to offer random access to doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access

2016-09-23 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515646#comment-15515646
 ] 

Adrien Grand commented on LUCENE-7457:
--

Something to be aware of when increasing it is that in the case that values 
require few bits (eg. an enum or a boolean field), the doc ids can quickly 
start to use significant disk space and could make doc values use _more_ disk 
space than when they were densely encoded.

> Default doc values format should optimize for iterator access
> -
>
> Key: LUCENE-7457
> URL: https://issues.apache.org/jira/browse/LUCENE-7457
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: LUCENE-7457.patch
>
>
> In LUCENE-7407 we switched doc values consumption from random access API to 
> an iterator API, but nothing was done there to improve the codec.  We should 
> do that here.
> At a bare minimum we should fix the existing very-sparse case to be a true 
> iterator, and not wrapped with the silly legacy wrappers.
> I think we should also increase the threshold (currently 1%?) when we switch 
> from dense to sparse encoding.  This should fix LUCENE-7253, making merging 
> of sparse doc values efficient ("pay for what you use").
> I'm sure there are many other things to explore to let codecs "take 
> advantage" of the fact that they no longer need to offer random access to doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access

2016-09-23 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515643#comment-15515643
 ] 

Adrien Grand commented on LUCENE-7457:
--

I don't mind increasing it to something like 10%. However I hope this will 
never be useful and we will write a DV format that better takes advantage of 
the iterator-style API before 7.0 is released?

> Default doc values format should optimize for iterator access
> -
>
> Key: LUCENE-7457
> URL: https://issues.apache.org/jira/browse/LUCENE-7457
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: LUCENE-7457.patch
>
>
> In LUCENE-7407 we switched doc values consumption from random access API to 
> an iterator API, but nothing was done there to improve the codec.  We should 
> do that here.
> At a bare minimum we should fix the existing very-sparse case to be a true 
> iterator, and not wrapped with the silly legacy wrappers.
> I think we should also increase the threshold (currently 1%?) when we switch 
> from dense to sparse encoding.  This should fix LUCENE-7253, making merging 
> of sparse doc values efficient ("pay for what you use").
> I'm sure there are many other things to explore to let codecs "take 
> advantage" of the fact that they no longer need to offer random access to doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7457) Default doc values format should optimize for iterator access

2016-09-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514530#comment-15514530
 ] 

Michael McCandless commented on LUCENE-7457:


Thanks [~jpountz], this looks great!  Should we also increase the sparse 
threshold (currently 1%) when writing doc values?  Or we can wait for a 
followon issue...

> Default doc values format should optimize for iterator access
> -
>
> Key: LUCENE-7457
> URL: https://issues.apache.org/jira/browse/LUCENE-7457
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Blocker
> Fix For: master (7.0)
>
> Attachments: LUCENE-7457.patch
>
>
> In LUCENE-7407 we switched doc values consumption from random access API to 
> an iterator API, but nothing was done there to improve the codec.  We should 
> do that here.
> At a bare minimum we should fix the existing very-sparse case to be a true 
> iterator, and not wrapped with the silly legacy wrappers.
> I think we should also increase the threshold (currently 1%?) when we switch 
> from dense to sparse encoding.  This should fix LUCENE-7253, making merging 
> of sparse doc values efficient ("pay for what you use").
> I'm sure there are many other things to explore to let codecs "take 
> advantage" of the fact that they no longer need to offer random access to doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org