[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15565052#comment-15565052 ] Michael McCandless commented on LUCENE-7474: A sparse set in the nightly benchmarks is an interesting idea. Do you have a data set in mind? At some point I'll write up a blog post summarizing the change and I can also try to do a before (6.x) / after (upcoming 7.0) one-time performance test for that. > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564060#comment-15564060 ] Otis Gospodnetic commented on LUCENE-7474: -- I was wondering how one could compare Lucene indexing (and searching) performance before and after this change. Is there a way to add a sparse dataset for the nightly benchmark and use it for both trunk and 6.x branch, so one can see the performance difference of Lucene 6.x with sparse data vs. Lucene 7.x with sparse data? > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548127#comment-15548127 ] Adrien Grand commented on LUCENE-7474: -- All our benchmarks use dense data I think. The good news is that these changes did not seem to slow down indexing in the dense case if I look at http://people.apache.org/~mikemccand/geobench.html#index-times or http://people.apache.org/~mikemccand/lucenebench/indexing.html, or at least the slow down is small enough so that nothing is noticeable if there are points or terms indexed too. However regarding search, this change is almost certainly going to make things slower (see eg. http://people.apache.org/~mikemccand/lucenebench/Term.html), I think we need to be careful about keeping the slowdown contained. > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548058#comment-15548058 ] Otis Gospodnetic commented on LUCENE-7474: -- yhooo! :) Do the nightly builds have any tests that will exercise these new writers, the new 7.0 Codec, etc., so one can see how much speed this change gains? > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545988#comment-15545988 ] ASF subversion and git services commented on LUCENE-7474: - Commit d50cf97617c88ec75fd8f4482003623db08e625e in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d50cf97 ] LUCENE-7474: Doc values writers should have a sparse encoding. > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7474) Improve doc values writers
[ https://issues.apache.org/jira/browse/LUCENE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545780#comment-15545780 ] Michael McCandless commented on LUCENE-7474: +1, wonderful. > Improve doc values writers > -- > > Key: LUCENE-7474 > URL: https://issues.apache.org/jira/browse/LUCENE-7474 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7474.patch > > > One of the goals of the new iterator-based API is to better handle sparse > data. However, the current doc values writers still use a dense > representation, and some of them perform naive linear scans in the nextDoc > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org