[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722649#comment-13722649 ] Adrien Grand commented on LUCENE-5127: -- +1 FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722697#comment-13722697 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1508147 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1508147 ] LUCENE-5127: FixedGapTermsIndex should use monotonic compression FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720828#comment-13720828 ] Michael McCandless commented on LUCENE-5127: +1, patch looks great. Thanks Rob! FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720856#comment-13720856 ] Adrien Grand commented on LUCENE-5127: -- This is a very nice cleanup! In FixedGapTermsIndexWriter, I think we could improve the buffering of offsets and addresses by directly buffering into a MonotonicBlockPackedWriter over a RamOutputStream, and then copy the raw content of the RamOutputStream to the IndexOutput? This would avoid an extra encoding/decoding step. FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720892#comment-13720892 ] Robert Muir commented on LUCENE-5127: - Good idea! I initially thought of the growableoutput, but i didnt want all the resizing. I think a RamOutputStream can work well. FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719728#comment-13719728 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507035 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507035 ] LUCENE-5127: create branch FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719729#comment-13719729 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507036 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507036 ] LUCENE-5127: dump current state FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719732#comment-13719732 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507041 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507041 ] LUCENE-5127: randomize codec parameter FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719752#comment-13719752 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507054 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507054 ] LUCENE-5127: fix solr tests FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719775#comment-13719775 ] Michael McCandless commented on LUCENE-5127: This cleanup is awesome, thanks Rob! I think we should just nuke the special -1 don't load terms index value? FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719793#comment-13719793 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507067 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507067 ] LUCENE-5127: nuke mergeReader FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719798#comment-13719798 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507070 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507070 ] LUCENE-5127: simplify seek-within-block FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719826#comment-13719826 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507075 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507075 ] LUCENE-5127: explicit var gap testing part 1 FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719836#comment-13719836 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507078 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507078 ] LUCENE-5127: explicit var gap testing part 2 FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719873#comment-13719873 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507083 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507083 ] LUCENE-5127: simplify vargap FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719892#comment-13719892 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507086 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507086 ] LUCENE-5127: simplify fixedgap FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719894#comment-13719894 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507087 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507087 ] LUCENE-5127: fix indent FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719932#comment-13719932 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507097 from [~mikemccand] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507097 ] LUCENE-5127: add tests FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720005#comment-13720005 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507111 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507111 ] LUCENE-5127: clear nocommits FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720024#comment-13720024 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507116 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507116 ] LUCENE-5127: fix TestLucene40PF and clean up some more outdated stuff FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720027#comment-13720027 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507118 from [~mikemccand] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507118 ] LUCENE-5127: fix false fail when terms dict is a ghostbuster FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720044#comment-13720044 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507120 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507120 ] LUCENE-5127: clean up error msgs FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720357#comment-13720357 ] ASF subversion and git services commented on LUCENE-5127: - Commit 1507179 from [~rcmuir] in branch 'dev/branches/lucene5127' [ https://svn.apache.org/r1507179 ] LUCENE-5127: use less ram when writing the terms index FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch, LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715169#comment-13715169 ] Michael McCandless commented on LUCENE-5127: +1 I think we should just nuke the divisor? FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5127) FixedGapTermsIndex should use monotonic compression
[ https://issues.apache.org/jira/browse/LUCENE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715302#comment-13715302 ] Robert Muir commented on LUCENE-5127: - Maybe, though we could also add a minimal get(long) interface to blockpacked/monotonicblockpacked/appending/monotonicappending. A few notes: * Current patch changes both the disk offsets (termsDictOffsets) and the offsets into the in-ram terms data (termOffsets) * With the current patch as-is, we could remove the interval*2B #terms limitation, as long addressing is used everywhere. * Current patch saves RAM, savings increase as termsindex/termsdict gets larger. With 10M: ||Checkout||TIB||TII|| |Trunk|519329144|19300603| |Patch|519329144|14149524| * Current patch slows down seek-heavy queries a bit: {noformat} Task QPS trunk StdDev QPS patch StdDev Pct diff PKLookup 86.02 (2.9%) 76.17 (2.4%) -11.4% ( -16% - -6%) Respell 39.76 (3.0%) 36.58 (2.5%) -8.0% ( -13% - -2%) Fuzzy2 35.49 (4.1%) 32.88 (2.6%) -7.3% ( -13% -0%) Fuzzy1 31.49 (4.1%) 29.18 (2.6%) -7.3% ( -13% -0%) {noformat} * termOffsets are read twice per seek / binary search iteration: {code} final long offset = fieldIndex.termOffsets.get(idx); final int length = (int) (fieldIndex.termOffsets.get(1+idx) - offset); {code} * termsDictOffsets are only read once... and this is really just an unfortunate consequence of TermsIndexReaderBase's API... ideally they would lazy-decode this until you really needed it, like BlockTree. So I see a few things we could do: # go forward with current patch (maybe add the divisor stuff via a simple get() interface). clean up int-long everywhere. I'm not sure if these perf diffs matter for the use cases where someone needs an ord-enabled terms index? # hybrid patch, where termOffsets stay absolute but termDictOffsets use monotonicpacked. This would still save some space, but restore the seek-heavy perf. But then we wouldnt be able to cleanup int-long and so on. # do nothing, maybe fork the logic of this thing so it can be used in DV. For how DV is used, it'd be the right tradeoff so its no issue there. FixedGapTermsIndex should use monotonic compression --- Key: LUCENE-5127 URL: https://issues.apache.org/jira/browse/LUCENE-5127 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5127.patch for the addresses in the big in-memory byte[] and disk blocks, we could save a good deal of RAM here. I think this codec just never got upgraded when we added these new packed improvements, but it might be interesting to try to use for the terms data of sorted/sortedset DV implementations. patch works, but has nocommits and currently ignores the divisor. The annoying problem there being that we have the shared interface with get(int) for PackedInts.Mutable/Reader, but no equivalent base class for monotonics get(long)... Still its enough that we could benchmark/compare for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org