[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-29 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722649#comment-13722649
 ] 

Adrien Grand commented on LUCENE-5127:
--

+1

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch, 
 LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722697#comment-13722697
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1508147 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1508147 ]

LUCENE-5127: FixedGapTermsIndex should use monotonic compression

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch, 
 LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720828#comment-13720828
 ] 

Michael McCandless commented on LUCENE-5127:


+1, patch looks great.  Thanks Rob!

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720856#comment-13720856
 ] 

Adrien Grand commented on LUCENE-5127:
--

This is a very nice cleanup! In FixedGapTermsIndexWriter, I think we could 
improve the buffering of offsets and addresses by directly buffering into a 
MonotonicBlockPackedWriter over a RamOutputStream, and then copy the raw 
content of the RamOutputStream to the IndexOutput? This would avoid an extra 
encoding/decoding step.

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720892#comment-13720892
 ] 

Robert Muir commented on LUCENE-5127:
-

Good idea! I initially thought of the growableoutput, but i didnt want all the 
resizing. I think a RamOutputStream can work well.

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719728#comment-13719728
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507035 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507035 ]

LUCENE-5127: create branch

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719729#comment-13719729
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507036 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507036 ]

LUCENE-5127: dump current state

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719732#comment-13719732
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507041 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507041 ]

LUCENE-5127: randomize codec parameter

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719752#comment-13719752
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507054 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507054 ]

LUCENE-5127: fix solr tests

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719775#comment-13719775
 ] 

Michael McCandless commented on LUCENE-5127:


This cleanup is awesome, thanks Rob!

I think we should just nuke the special -1 don't load terms index value?

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719793#comment-13719793
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507067 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507067 ]

LUCENE-5127: nuke mergeReader

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719798#comment-13719798
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507070 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507070 ]

LUCENE-5127: simplify seek-within-block

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719826#comment-13719826
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507075 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507075 ]

LUCENE-5127: explicit var gap testing part 1

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719836#comment-13719836
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507078 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507078 ]

LUCENE-5127: explicit var gap testing part 2

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719873#comment-13719873
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507083 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507083 ]

LUCENE-5127: simplify vargap

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719892#comment-13719892
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507086 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507086 ]

LUCENE-5127: simplify fixedgap

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719894#comment-13719894
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507087 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507087 ]

LUCENE-5127: fix indent

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719932#comment-13719932
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507097 from [~mikemccand] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507097 ]

LUCENE-5127: add tests

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720005#comment-13720005
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507111 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507111 ]

LUCENE-5127: clear nocommits

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720024#comment-13720024
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507116 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507116 ]

LUCENE-5127: fix TestLucene40PF and clean up some more outdated stuff

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720027#comment-13720027
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507118 from [~mikemccand] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507118 ]

LUCENE-5127: fix false fail when terms dict is a ghostbuster

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720044#comment-13720044
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507120 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507120 ]

LUCENE-5127: clean up error msgs

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720357#comment-13720357
 ] 

ASF subversion and git services commented on LUCENE-5127:
-

Commit 1507179 from [~rcmuir] in branch 'dev/branches/lucene5127'
[ https://svn.apache.org/r1507179 ]

LUCENE-5127: use less ram when writing the terms index

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch, LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715169#comment-13715169
 ] 

Michael McCandless commented on LUCENE-5127:


+1

I think we should just nuke the divisor?

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression

2013-07-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715302#comment-13715302
 ] 

Robert Muir commented on LUCENE-5127:
-

Maybe, though we could also add a minimal get(long) interface to 
blockpacked/monotonicblockpacked/appending/monotonicappending.

A few notes:
* Current patch changes both the disk offsets (termsDictOffsets) and the 
offsets into the in-ram terms data (termOffsets)
* With the current patch as-is, we could remove the interval*2B #terms 
limitation, as long addressing is used everywhere.
* Current patch saves RAM, savings increase as termsindex/termsdict gets 
larger. With 10M:
||Checkout||TIB||TII||
|Trunk|519329144|19300603|
|Patch|519329144|14149524|
* Current patch slows down seek-heavy queries a bit:
{noformat}
Task   QPS trunk  StdDev   QPS patch  StdDev
Pct diff
PKLookup   86.02  (2.9%)   76.17  (2.4%)  
-11.4% ( -16% -   -6%)
 Respell   39.76  (3.0%)   36.58  (2.5%)   
-8.0% ( -13% -   -2%)
  Fuzzy2   35.49  (4.1%)   32.88  (2.6%)   
-7.3% ( -13% -0%)
  Fuzzy1   31.49  (4.1%)   29.18  (2.6%)   
-7.3% ( -13% -0%)
{noformat}
* termOffsets are read twice per seek / binary search iteration:
{code}
  final long offset = fieldIndex.termOffsets.get(idx);
  final int length = (int) (fieldIndex.termOffsets.get(1+idx) - offset);
{code}
* termsDictOffsets are only read once... and this is really just an unfortunate 
consequence of TermsIndexReaderBase's API... ideally they would lazy-decode 
this until you really needed it, like BlockTree.

So I see a few things we could do:
# go forward with current patch (maybe add the divisor stuff via a simple get() 
interface). clean up int-long everywhere. I'm not sure if these perf diffs 
matter for the use cases where someone needs an ord-enabled terms index?
# hybrid patch, where termOffsets stay absolute but termDictOffsets use 
monotonicpacked. This would still save some space, but restore the seek-heavy 
perf. But then we wouldnt be able to cleanup int-long and so on.
# do nothing, maybe fork the logic of this thing so it can be used in DV. For 
how DV is used, it'd be the right tradeoff so its no issue there.

 FixedGapTermsIndex should use monotonic compression
 ---

 Key: LUCENE-5127
 URL: https://issues.apache.org/jira/browse/LUCENE-5127
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-5127.patch


 for the addresses in the big in-memory byte[] and disk blocks, we could save 
 a good deal of RAM here.
 I think this codec just never got upgraded when we added these new packed 
 improvements, but it might be interesting to try to use for the terms data of 
 sorted/sortedset DV implementations.
 patch works, but has nocommits and currently ignores the divisor. The 
 annoying problem there being that we have the shared interface with 
 get(int) for PackedInts.Mutable/Reader, but no equivalent base class for 
 monotonics get(long)... 
 Still its enough that we could benchmark/compare for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org