[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565052#comment-15565052
 ] 

Michael McCandless commented on LUCENE-7474:


A sparse set in the nightly benchmarks is an interesting idea.  Do you have a 
data set in mind?

At some point I'll write up a blog post summarizing the change and I can also 
try to do a before (6.x) / after (upcoming 7.0) one-time performance test for 
that.

> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564060#comment-15564060
 ] 

Otis Gospodnetic commented on LUCENE-7474:
--

I was wondering how one could compare Lucene indexing (and searching) 
performance before and after this change.  Is there a way to add a sparse 
dataset for the nightly benchmark and use it for both trunk and 6.x branch, so 
one can see the performance difference of Lucene 6.x with sparse data vs. 
Lucene 7.x with sparse data?

> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-05 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548127#comment-15548127
 ] 

Adrien Grand commented on LUCENE-7474:
--

All our benchmarks use dense data I think. The good news is that these changes 
did not seem to slow down indexing in the dense case if I look at 
http://people.apache.org/~mikemccand/geobench.html#index-times or 
http://people.apache.org/~mikemccand/lucenebench/indexing.html, or at least the 
slow down is small enough so that nothing is noticeable if there are points or 
terms indexed too. However regarding search, this change is almost certainly 
going to make things slower (see eg. 
http://people.apache.org/~mikemccand/lucenebench/Term.html), I think we need to 
be careful about keeping the slowdown contained.

> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548058#comment-15548058
 ] 

Otis Gospodnetic commented on LUCENE-7474:
--

yhooo! :)
Do the nightly builds have any tests that will exercise these new writers, the 
new 7.0 Codec, etc., so one can see how much speed this change gains?

> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545988#comment-15545988
 ] 

ASF subversion and git services commented on LUCENE-7474:
-

Commit d50cf97617c88ec75fd8f4482003623db08e625e in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d50cf97 ]

LUCENE-7474: Doc values writers should have a sparse encoding.


> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7474) Improve doc values writers

2016-10-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545780#comment-15545780
 ] 

Michael McCandless commented on LUCENE-7474:


+1, wonderful.

> Improve doc values writers
> --
>
> Key: LUCENE-7474
> URL: https://issues.apache.org/jira/browse/LUCENE-7474
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7474.patch
>
>
> One of the goals of the new iterator-based API is to better handle sparse 
> data. However, the current doc values writers still use a dense 
> representation, and some of them perform naive linear scans in the nextDoc 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org