[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-26 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608268#comment-15608268
 ] 

ASF subversion and git services commented on LUCENE-7489:
-

Commit 643429de6e162fd85d5100137d01ee29e4bb614a in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=643429d ]

LUCENE-7489: Remove one layer of abstraction in binary doc values and 
single-valued numerics.


> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588205#comment-15588205
 ] 

Michael McCandless commented on LUCENE-7489:


That's awesome progress on sort performance!  Thanks [~jpountz].

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587897#comment-15587897
 ] 

Adrien Grand commented on LUCENE-7489:
--

It looks like it worked, sorting by date time now seems a bit faster than it 
was before this change was pushed, but still slower than before we switched to 
an iterator API: 
http://people.apache.org/~mikemccand/lucenebench/TermDTSort.html

The surprise to me is more that sorting by title looks faster than it was 
before we switched to an iterator API: 
http://people.apache.org/~mikemccand/lucenebench/TermTitleSort.html, which is 
good news.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-18 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585546#comment-15585546
 ] 

Adrien Grand commented on LUCENE-7489:
--

The only difference that I could find is that we now wrap twice instead of only 
once before when gcd compression is enabled. I changed it, which yielded a ~2% 
improvement on wikimedium1m. This is far from the ~8% that the nightly 
benchmarks report, but it could be that the differences in the dataset explain 
it. I'll keep watching this benchmark over the next days.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585541#comment-15585541
 ] 

ASF subversion and git services commented on LUCENE-7489:
-

Commit a17e92006f087a0601d9329bf9b9c946ca72478b in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a17e920 ]

LUCENE-7489: Wrap only once in case GCD compression is used.


> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-18 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584752#comment-15584752
 ] 

Adrien Grand commented on LUCENE-7489:
--

It looks like it helped sorting by title 
(http://people.apache.org/~mikemccand/lucenebench/TermTitleSort.html) but not 
by date (http://people.apache.org/~mikemccand/lucenebench/TermDTSort.html). 
I'll look into it.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581556#comment-15581556
 ] 

ASF subversion and git services commented on LUCENE-7489:
-

Commit 927fd51d64a6e72843018786daea855847416487 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=927fd51 ]

LUCENE-7489: Better sparsity support for Lucene70DocValuesFormat.


> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575247#comment-15575247
 ] 

Adrien Grand commented on LUCENE-7489:
--

I just ran the reproduction line again now that LUCENE-7495 fixed and the test 
passed. Things should be good now.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-13 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572336#comment-15572336
 ] 

Adrien Grand commented on LUCENE-7489:
--

Phew, the bug was not in the new format but in nested sorting: LUCENE-7495.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-13 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572173#comment-15572173
 ] 

Adrien Grand commented on LUCENE-7489:
--

Thanks for reporting this seed, I'll look into this bug.

bq. It looks like it uses the same compression techniques for the values as the 
6.x codec, but then for "which docIDs have a value" it has three different 
approaches, for the very sparse, mostly dense, and 100% dense cases.

This is correct.

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7489) Improve sparsity support of Lucene70DocValuesFormat

2016-10-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572163#comment-15572163
 ] 

Michael McCandless commented on LUCENE-7489:


+1, this patch looks wonderful!

It looks like it uses the same compression techniques for the values as the 6.x 
codec, but then for "which docIDs have a value" it has three different 
approaches, for the very sparse, mostly dense, and 100% dense cases.

I hit this test failure, but doesn't repro on trunk (though it could still be a 
pre-existing issue, if e.g. this patch shifted seeds):

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestBlockJoinSorting -Dtests.method=testNestedSorting 
-Dtests.seed=A0B8F022A1A8B661 -Dtests.locale=en-CA -Dtests.timezone=Etc/GMT+4 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.20s | TestBlockJoinSorting.testNestedSorting <<<
   [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<[e]> but 
was:<[f]>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([A0B8F022A1A8B661:A8511D63E101BB0F]:0)
   [junit4]>at 
org.apache.lucene.search.join.TestBlockJoinSorting.testNestedSorting(TestBlockJoinSorting.java:233)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
{field1=FST50, __type=Lucene50(blocksize=128), 
filter_1=Lucene50(blocksize=128), field2=Lucene50(blocksize=128)}, 
docValues:{field2=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=972, 
maxMBSortInHeap=5.645435808865713, sim=RandomSimilarity(queryNorm=false): {}, 
locale=en-CA, timezone=Etc/GMT+4
   [junit4]   2> NOTE: Linux 4.4.0-38-generic amd64/Oracle Corporation 
1.8.0_101 (64-bit)/cpus=8,threads=1,free=420118024,total=514850816
   [junit4]   2> NOTE: All tests run in this JVM: [TestBlockJoinSorting]
   [junit4] Completed [1/1 (1!)] in 0.37s, 1 test, 1 failure <<< FAILURES!
{noformat}

> Improve sparsity support of Lucene70DocValuesFormat
> ---
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org