[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244399#comment-13244399 ] Robert Muir commented on LUCENE-3738: - Uwe, is this one good to go now? Can we mark it resolved? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244729#comment-13244729 ] Uwe Schindler commented on LUCENE-3738: --- Commit patch or not? I have no preference. Perf is sometimes slightly better (in microbenchmark on my slow-io system always). Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244738#comment-13244738 ] Robert Muir commented on LUCENE-3738: - If there is no real measurable performance regression, can we just commit it to trunk? I really am afraid of last minute .store optimizations. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244743#comment-13244743 ] Uwe Schindler commented on LUCENE-3738: --- Part of the optimization is already committed to 3.x since 4 days, this is just another round of optimization. :-) Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244748#comment-13244748 ] Uwe Schindler commented on LUCENE-3738: --- I don't care, it's committed to trunk. The new random test was committed yesterday. You are the release manager :-) Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244770#comment-13244770 ] Robert Muir commented on LUCENE-3738: - Lets resolve the issue. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243704#comment-13243704 ] Robert Muir commented on LUCENE-3738: - since there is no performance regression, let's do trunk only. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243729#comment-13243729 ] Uwe Schindler commented on LUCENE-3738: --- I added a random test for vints and vlogs in TestIndexInput. I wanted to especially test the long case, which looked broken in DataOutput (but it was correct - but only because of the way how java handles negative values when casting to long - I just made it clear what happens). Robert: The tests last night showed no change, so I have no preference. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243125#comment-13243125 ] Michael McCandless commented on LUCENE-3738: Thanks Uwe, I'll test! Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243138#comment-13243138 ] Michael McCandless commented on LUCENE-3738: Alas, the results are now all over the place! And I went back to the prior patch and tried to reproduce the above results... and the results are still all over the place. I think we are chasing Java ghosts at this point... Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243143#comment-13243143 ] Uwe Schindler commented on LUCENE-3738: --- What does your comment mean? Good or bad? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243265#comment-13243265 ] Uwe Schindler commented on LUCENE-3738: --- Mike, I was away from home and did not understand your comment, now its clear: You cannot reproduce the speedup from last patch neither can you see a difference with current patch. I would suggest that I commit this now to trunk, we test a few nights and then commit it to 3.x (Robert needs to backport Ivy to 3.6, so we have some time). I will commit this later before going to sleep, so we see results tomorrow. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243293#comment-13243293 ] Michael McCandless commented on LUCENE-3738: Sorry Uwe, that was exactly it: I don't know what to conclude from the perf runs anymore. But +1 for your new patch: it ought to be better since the code is simpler. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Blocker Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242550#comment-13242550 ] Michael McCandless commented on LUCENE-3738: Removing the asserts apparently didn't change the perf... I can reproduce the slowdown in a separate test (before/after this commit): {noformat} TaskQPS base StdDev baseQPS vInt StdDev vInt Pct diff IntNRQ7.110.896.730.58 -23% - 17% Prefix3 16.070.96 15.650.72 -12% - 8% Wildcard 20.140.91 19.670.77 -10% - 6% PKLookup 154.625.08 151.112.82 -7% - 2% Fuzzy1 85.241.53 83.871.18 -4% - 1% Fuzzy2 44.111.03 43.960.44 -3% - 3% SpanNear3.230.113.220.07 -5% - 5% TermBGroup1M1P 42.350.49 42.431.43 -4% - 4% Respell 65.111.91 65.271.27 -4% - 5% AndHighMed 54.184.04 54.502.27 -10% - 13% TermGroup1M 31.270.35 31.460.63 -2% - 3% TermBGroup1M 45.010.33 45.371.42 -3% - 4% AndHighHigh 13.350.71 13.460.50 -7% - 10% Term 82.713.12 83.562.33 -5% - 7% OrHighMed 10.660.67 10.780.44 -8% - 12% OrHighHigh7.080.427.190.26 -7% - 11% SloppyPhrase5.110.245.200.31 -8% - 13% Phrase 11.140.75 11.400.50 -8% - 14% {noformat} But then Uwe made a patch (I'll attach) reducing the byte code for the unrolled methods: {noformat} TaskQPS base StdDev baseQPS vInt StdDev vInt Pct diff SpanNear3.240.133.180.07 -7% - 4% Phrase 11.340.68 11.130.38 -10% - 7% SloppyPhrase5.170.235.080.18 -9% - 6% TermBGroup1M1P 41.920.80 41.570.94 -4% - 3% TermGroup1M 30.740.68 30.810.96 -5% - 5% Term 80.873.52 81.292.05 -6% - 7% TermBGroup1M 43.940.93 44.171.32 -4% - 5% AndHighMed 53.712.62 54.211.97 -7% - 9% AndHighHigh 13.200.42 13.410.41 -4% - 8% Respell 65.372.70 66.533.29 -7% - 11% Fuzzy1 84.292.11 86.443.36 -3% - 9% PKLookup 149.814.20 153.879.46 -6% - 12% OrHighHigh7.190.287.400.48 -7% - 13% OrHighMed 10.820.43 11.160.73 -7% - 14% Fuzzy2 43.720.96 45.242.03 -3% - 10% Wildcard 18.961.00 20.050.39 -1% - 13% Prefix3 14.960.83 15.890.27 -1% - 14% IntNRQ5.890.586.950.174% - 34% {noformat} So... I think we should commit it! Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242554#comment-13242554 ] Uwe Schindler commented on LUCENE-3738: --- I an commit that, OK? We should also do this in 3.x, Robert are you fine? Otherwise this issue is only half committed to 3.x :( Its no risk. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242564#comment-13242564 ] Robert Muir commented on LUCENE-3738: - I'm going with your instinct on this one. It would be bad to have a slowdown for 3.6, but I want the negative vlong checks, too. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239618#comment-13239618 ] Yonik Seeley commented on LUCENE-3738: -- Regarding unrolling... it hasn't always proved faster in the past, esp wrt vint. My first try was in 2005: http://www.lucidimagination.com/search/document/6d2efedb4dde07d#2a896a9a9adc3f2d And again in 2006: https://issues.apache.org/jira/browse/LUCENE-639 Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239621#comment-13239621 ] Uwe Schindler commented on LUCENE-3738: --- Yonik, the unrolling was added because of a recent Java 6 hotspot bug (who unrolled the loop itsself - but wrongly). The thing that Mike has seen was a strange thing that the already unrolled code (since 3.1) behaves different before/after a slight code change done in this issue. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239628#comment-13239628 ] Robert Muir commented on LUCENE-3738: - The original unrolling here was to dodge a JVM bug, possible in all java versions from ... java6u20 until java6u29 or so? I don't know if there's another solution other than unrolling to work around that loop bug. I don't like the workaround but it does seem realistic at this point to prevent index corruption since these versions of java are really recent. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239667#comment-13239667 ] Michael McCandless commented on LUCENE-3738: +1 to remove those asserts... let's see if this fixes the slowdown the nightly builds hit on 3/18: http://people.apache.org/~mikemccand/lucenebench/IntNRQ.html Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239672#comment-13239672 ] Uwe Schindler commented on LUCENE-3738: --- Committed the assert removal in revision: trunk 1305909, 3.x 1305911 If this does not help, we can revert again. But the checks are in my opinion not really useful and too risky. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232272#comment-13232272 ] Michael McCandless commented on LUCENE-3738: bq. In my opinion, we should unroll all readVInt/readVLong loops so all behave 100% identical! +1 Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232336#comment-13232336 ] Michael McCandless commented on LUCENE-3738: +1 Looks awesome Uwe! Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232392#comment-13232392 ] Uwe Schindler commented on LUCENE-3738: --- Committed trunk revision: 1302238 Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231962#comment-13231962 ] Michael McCandless commented on LUCENE-3738: bq. The check is only ommitted in the unrolled loop, the for-loop still contains the check. I'm confused... I don't see how/where BufferedIndexInput.readVLong is checking for negative result now...? Are you proposing adding an if into that method? That's what I don't want to do... eg, readVLong is called 3 times per term we decode (Lucene40 codec); it's a very low level API... other codecs may very well call it more often. I don't think we should add an if inside BII.readVLong. Or maybe you are saying you just want the unrolled code to handle the negative vLong case (ie, unroll the currently missing 10th cycle), and not add an if to BufferedIndexInput.readVLong? And then for free we can add a real if (not assert) if that 10th cycle is hit? (ie, if we get to that 10th byte, throw an exception). I think that makes sense! bq. there are other asserts in the index readiung code at places completely outside any loops, executed only once when index is opened. +1 to make those real checks, as long as the cost is vanishingly small. bq. which is also a security issue when you e.g. download indexes through network connections and a man in the middle modifies the stream. I don't think it's our job to protect against / detect that. bq. Disk IO can produce wrong data. True, but all bets are off if that happens: you're gonna get all sorts of crazy exceptions out of Lucene. We are not a filesystem. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231978#comment-13231978 ] Uwe Schindler commented on LUCENE-3738: --- {quote} bq. The check is only ommitted in the unrolled loop, the for-loop still contains the check. I'm confused... I don't see how/where BufferedIndexInput.readVLong is checking for negative result now...? {quote} I mean that the actual if (b mask != 0) is also in the original while loop. The original while loop then simply proceeds with reading bytes util the highest bit is null. The unrolled loop behaves different (and thats a real bug), because it will silently not read those remaining bytes, so the file pointer is on a different byte after the call. This also affects readVInt!!! In my opinion, we should unroll *all* readVInt/readVLong loops so all behave 100% identical! And in the case of the last byte read (where the current assert is), throw exception. If we don't unroll all readVInts we have to somehow also make the loop exit after too many bytes are read, which would be an costly extra check in the loop - thats the reason why I want to unroll all loops to fail after 5 or 9 bytes. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231222#comment-13231222 ] Robert Muir commented on LUCENE-3738: - LUCENE-3876/LUCENE-3879 reveal more situations where we must write negatives with the current encodings, because we steal bits from things like positions (payloads) and docids too (at least in skip data?) So sometimes its possible these are encoded as negatives. I think Uwe should commit his fixes? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231295#comment-13231295 ] Michael McCandless commented on LUCENE-3738: Hmm... I think we should think about it more. Ie, we apparently never write a negative vLong today... and I'm not sure we should start allowing it...? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231297#comment-13231297 ] Robert Muir commented on LUCENE-3738: - Well that differs with the title of the issue (consistency with vInt). I don't see how we can avoid negative vints. I think its ok to be inconsistent with vLong, but it should not be something we assert only at read-time. It should be asserted on write so that problems are found immediately. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231309#comment-13231309 ] Michael McCandless commented on LUCENE-3738: {quote} don't see how we can avoid negative vints. I think its ok to be inconsistent with vLong, but it should not be something we assert only at read-time. It should be asserted on write so that problems are found immediately. {quote} +1 I think we are stuck with negative vInts, as trappy as they are (5 bytes!!). Let's not make it worse by allowing negative vLongs. But let's assert that at write time (and read time)... I think inconsistency here is the lesser evil. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231401#comment-13231401 ] Uwe Schindler commented on LUCENE-3738: --- I just repeat myself: bq. If we disallow, it should be a hard check (no assert), as the data is coming from a file (and somebody could used a hex editor). The reader will crash later... Mike: If you fix the unrolled loops, please also add the checks to the other implementations in Buffered* and so on. My original patch fixed those. Please include that patch. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231470#comment-13231470 ] Michael McCandless commented on LUCENE-3738: bq. If we disallow, it should be a hard check (no assert), as the data is coming from a file (and somebody could used a hex editor). The reader will crash later... Hmm, I don't think we should do that. If you go and edit your index with a hex editor... there are no guarantees on what may ensue! bq. Mike: If you fix the unrolled loops, please also add the checks to the other implementations in Buffered* and so on. I don't think the unrolled loops or other impls of write/readVLong are wrong? The javadocs state clearly that negatives are not supported. All we're doing here is added an assert to backup that javadoc statement. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231492#comment-13231492 ] Uwe Schindler commented on LUCENE-3738: --- bq. Hmm, I don't think we should do that. It costs nothing, as the standard vInt will be read only some 1 or 2 bytes, if you really read until the last byte, you have so big vInts that it might even be better not to use vInts at all.- And: The not-unrolled loops do the check always. bq. If you go and edit your index with a hex editor... there are no guarantees on what may ensue! Disk IO can produce wrong data. We must check this if we can and it costs nothing, which is the case here (see above). I was already talking with Robert, there are other asserts in the index readiung code at places completely outside any loops, executed only once when index is opened. Its first priority to do consistency checks of the read bytes. Otherwise you can even produce endless loops at some places. - Of course not when you have tight loops, but things like checking that the document count is in line with e.g. some other value from liveDocs is essential. I will open an issue for that, the older Lucene formats are much besster secured, but trunk is horrible, just because some people here seem to want to prevent any check, which is also a security issue when you e.g. download indexes through network connections and a man in the middle modifies the stream. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231506#comment-13231506 ] Robert Muir commented on LUCENE-3738: - {quote} Otherwise you can even produce endless loops at some places. - Of course not when you have tight loops, but things like checking that the document count is in line with e.g. some other value from liveDocs is essential. {quote} I agree there a ton of places (essentially all metadata) where we should be using real checks (not asserts). I see Mike's point though: readVLong() is very general, so someone could be using it where performance is important. It just so happens its mostly only used today for metadata type things (except maybe terms dictionary stats and a few other places). Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231543#comment-13231543 ] Uwe Schindler commented on LUCENE-3738: --- You misunderstood my comment: bq. I see Mike's point though: readVLong() is very general, so someone could be using it where performance is important. The check is only ommitted in the unrolled loop, the for-loop still contains the check. In that case it also handles maybe too-long vints correctly, which the unrolled code will never do. The unrolled code also has a bug, that it handles negative longs wrong, but that should be prevented (in my opinion also for ints). The current assert for both long and int is completely harmless, as it will only be executed, if the vInt/vLong has the maximum number of bytes, which is very unlikely. And as said before the check is done in the loop-based code, too. And comparison in perf showed that the speed of the unrolled loop and the standard loop are identical, so about what are you talking? The good thing here is that we can (in the unrolled loops) harden the check for negative vInts, as because of the unrolled loop we have a separate cocde branch already, so we can modify the check (which was always done before my unrolling) to do the better check. I had the idea at that times to unroll that loop because of Java bugs, so the bug is caused by me and I want to fix it the correct way, definitely without loosing anything. Is this so hard to understand? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196915#comment-13196915 ] Uwe Schindler commented on LUCENE-3738: --- That was my faulkt wen the Schindler VM instead the Hotspot VM unrolled the loop. I unrolled maximum of 9 bytes not 10 (which is wasteful, too). We had no negative VInts until now, so that was no issue :-) Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196916#comment-13196916 ] Uwe Schindler commented on LUCENE-3738: --- bq. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). As the code is written by a loop, we would write 10 bytes, of course the last one only with 1 bit. If we wont to spare that and optimize the long case to interpret the continuation bit in the last byte different (as part of data), the writer must also do that. Ideally we would unroll both loops. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196918#comment-13196918 ] Robert Muir commented on LUCENE-3738: - {quote} As the code is written by a loop, we would write 10 bytes, of course the last one only with 1 bit. {quote} I think this is ok: otherwise its not a variable-length integer but something else (with a special case where 9th byte high bit means sign bit instead of continuation bit). Either way we should either fix it to assert value =0 in the writer, or make it work. Ideally we would do that for both vint and vlong, but the problem is some things like term vectors sometimes write negative vints (since it does startOffset - lastEndOffset, if you have any synonyms you get tons of huge 5-byte vints in your term vectors) But currently its inconsistent: negatives don't trip any assert for either vint/vlong at write-time, but at read-time for vlong *only*. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196919#comment-13196919 ] Robert Muir commented on LUCENE-3738: - {quote} We had no negative VLongs until now, so that was no issue {quote} I think we should still avoid this! I think the javadocs should still say: negatives are unsupported. Maybe we can fix lucene 4's term vectors format to never write negatives, and in version 5 when 3.x indexes no longer need to be read, we can assert = 0 at write-time? Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196941#comment-13196941 ] Robert Muir commented on LUCENE-3738: - {quote} We should try to never write negative numbers for post 4.0 formats, maybe add conditional assert (like for utf8 strings), so preflex can write using negative vints and dont trip assert. {quote} Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196951#comment-13196951 ] Uwe Schindler commented on LUCENE-3738: --- Just to conclude: The bug with reading negative vLongs affects only DataInputs that dont override readVLong. So e.g. BufferedIndexInput is not affected. So when you read index from MMap or ByteArrayDataInput or InputStreamDataInput you will hit the bug. Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196955#comment-13196955 ] Uwe Schindler commented on LUCENE-3738: --- The BufferedIndexInput one has also this bug, only affecting reads at the boundaries (if the 10 bytes of a full int are no longer in buffer), in that case it throws AIOOBE: {code:java} public long readVLong() throws IOException { if (9 = bufferLength-bufferPosition) { byte b = buffer[bufferPosition++]; long i = b 0x7F; for (int shift = 7; (b 0x80) != 0; shift += 7) { b = buffer[bufferPosition++]; i |= (b 0x7FL) shift; } return i; } else { return super.readVLong(); } } {code} Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196968#comment-13196968 ] Michael McCandless commented on LUCENE-3738: I think we should disallow (assert) writing negative vLong, and, ideally disallow negative vInt also, by fixing all the places that rely on this (but, carefully... preflexrw will need a backdoor)... Separately: can't we strengthen the last assert in writeVInt to verify the top 4 bits are 0, not just the top bit? We have 36 bits at that point right? So top 4 should be unused (assert (b 0xf0) == 0) Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196972#comment-13196972 ] Uwe Schindler commented on LUCENE-3738: --- If we disallow, it should be a hard check (no assert), as the data is coming from a file (and somebody could used a hex editor). The reader will crash later... Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong
[ https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197383#comment-13197383 ] Robert Muir commented on LUCENE-3738: - after investigating: its difficult to prevent negative offsets, even after fixing term vectors writer (LUCENE-3739) At first i tried a simple assert in BaseTokenStreamTestCase: {code} assertTrue(offsets must not go backwards, offsetAtt.startOffset() = lastStartOffset); lastStartOffset = offsetAtt.startOffset(); {code} Then these analyzers failed: * MockCharFilter itself had a bug, but thats easy to fix (LUCENE-3741) * synonymsfilter failed sometimes (LUCENE-3742) because it wrote zeros for offsets in situations like a - b c * (edge)ngramtokenizers failed, because ngrams(1,2) of ABCD are not A, AB, B, BC, C, CD, D but instead A, B, C, D, AB, BC, CD, ... * (edge)ngramfilters failed for similar reasons. * worddelimiterfilter failed, because it doesnt break AB into A, AB, B but instead A, B, AB * trimfilter failed when 'offsets changing' is enabled, because if you have rob, robert as synonyms then it trims the first, and the second offsets go backwards These are all bugs. In general I think offsets after being set should not be changed, because filters don't have access to any charfilters offset correction (correctOffset()) anyway, so they shouldnt be mucking offsets. So really: only the creator of tokens should make the offsets. And if thats a filter, it should be a standard way, only inherited from existing offsets and not 'offset mathematics' and not A, AB, B in some places and A, B, AB in others. Really i think we need to step it up if we want highlighting to be first-class citizen in lucene, nothing checks the offsets anyhwere at all, even to check/assert if they are negative, and there are little tests... all we have is some newish stuff in basetokenstreamtestcase and a few trivial test cases. On the other hand, for example, position increment's impl actually throws exception if you give it something like a negative number... Be consistent about negative vInt/vLong --- Key: LUCENE-3738 URL: https://issues.apache.org/jira/browse/LUCENE-3738 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 3.6, 4.0 Attachments: LUCENE-3738.patch, LUCENE-3738.patch Today, write/readVInt allows a negative int, in that it will encode and decode correctly, just horribly inefficiently (5 bytes). However, read/writeVLong fails (trips an assert). I'd prefer that both vInt/vLong trip an assert if you ever try to write a negative number... it's badly trappy today. But, unfortunately, we sometimes rely on this... had we had this assert in 'since the beginning' we could have avoided that. So, if we can't add that assert in today, I think we should at least fix readVLong to handle negative longs... but then you quietly spend 9 bytes (even more trappy!). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org