[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608268#comment-15608268 ] ASF subversion and git services commented on LUCENE-7489: - Commit 643429de6e162fd85d5100137d01ee29e4bb614a in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=643429d ] LUCENE-7489: Remove one layer of abstraction in binary doc values and single-valued numerics. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588205#comment-15588205 ] Michael McCandless commented on LUCENE-7489: That's awesome progress on sort performance! Thanks [~jpountz]. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587897#comment-15587897 ] Adrien Grand commented on LUCENE-7489: -- It looks like it worked, sorting by date time now seems a bit faster than it was before this change was pushed, but still slower than before we switched to an iterator API: http://people.apache.org/~mikemccand/lucenebench/TermDTSort.html The surprise to me is more that sorting by title looks faster than it was before we switched to an iterator API: http://people.apache.org/~mikemccand/lucenebench/TermTitleSort.html, which is good news. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585546#comment-15585546 ] Adrien Grand commented on LUCENE-7489: -- The only difference that I could find is that we now wrap twice instead of only once before when gcd compression is enabled. I changed it, which yielded a ~2% improvement on wikimedium1m. This is far from the ~8% that the nightly benchmarks report, but it could be that the differences in the dataset explain it. I'll keep watching this benchmark over the next days. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585541#comment-15585541 ] ASF subversion and git services commented on LUCENE-7489: - Commit a17e92006f087a0601d9329bf9b9c946ca72478b in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a17e920 ] LUCENE-7489: Wrap only once in case GCD compression is used. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584752#comment-15584752 ] Adrien Grand commented on LUCENE-7489: -- It looks like it helped sorting by title (http://people.apache.org/~mikemccand/lucenebench/TermTitleSort.html) but not by date (http://people.apache.org/~mikemccand/lucenebench/TermDTSort.html). I'll look into it. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581556#comment-15581556 ] ASF subversion and git services commented on LUCENE-7489: - Commit 927fd51d64a6e72843018786daea855847416487 in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=927fd51 ] LUCENE-7489: Better sparsity support for Lucene70DocValuesFormat. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575247#comment-15575247 ] Adrien Grand commented on LUCENE-7489: -- I just ran the reproduction line again now that LUCENE-7495 fixed and the test passed. Things should be good now. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572336#comment-15572336 ] Adrien Grand commented on LUCENE-7489: -- Phew, the bug was not in the new format but in nested sorting: LUCENE-7495. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572173#comment-15572173 ] Adrien Grand commented on LUCENE-7489: -- Thanks for reporting this seed, I'll look into this bug. bq. It looks like it uses the same compression techniques for the values as the 6.x codec, but then for "which docIDs have a value" it has three different approaches, for the very sparse, mostly dense, and 100% dense cases. This is correct. > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572163#comment-15572163 ] Michael McCandless commented on LUCENE-7489: +1, this patch looks wonderful! It looks like it uses the same compression techniques for the values as the 6.x codec, but then for "which docIDs have a value" it has three different approaches, for the very sparse, mostly dense, and 100% dense cases. I hit this test failure, but doesn't repro on trunk (though it could still be a pre-existing issue, if e.g. this patch shifted seeds): {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestBlockJoinSorting -Dtests.method=testNestedSorting -Dtests.seed=A0B8F022A1A8B661 -Dtests.locale=en-CA -Dtests.timezone=Etc/GMT+4 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.20s | TestBlockJoinSorting.testNestedSorting <<< [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<[e]> but was:<[f]> [junit4]>at __randomizedtesting.SeedInfo.seed([A0B8F022A1A8B661:A8511D63E101BB0F]:0) [junit4]>at org.apache.lucene.search.join.TestBlockJoinSorting.testNestedSorting(TestBlockJoinSorting.java:233) [junit4]>at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {field1=FST50, __type=Lucene50(blocksize=128), filter_1=Lucene50(blocksize=128), field2=Lucene50(blocksize=128)}, docValues:{field2=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=972, maxMBSortInHeap=5.645435808865713, sim=RandomSimilarity(queryNorm=false): {}, locale=en-CA, timezone=Etc/GMT+4 [junit4] 2> NOTE: Linux 4.4.0-38-generic amd64/Oracle Corporation 1.8.0_101 (64-bit)/cpus=8,threads=1,free=420118024,total=514850816 [junit4] 2> NOTE: All tests run in this JVM: [TestBlockJoinSorting] [junit4] Completed [1/1 (1!)] in 0.37s, 1 test, 1 failure <<< FAILURES! {noformat} > Improve sparsity support of Lucene70DocValuesFormat > --- > > Key: LUCENE-7489 > URL: https://issues.apache.org/jira/browse/LUCENE-7489 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-7489.patch, LUCENE-7489.patch > > > Like Lucene70NormsFormat, it should be able to only encode actual values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org