[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244399#comment-13244399
 ] 

Robert Muir commented on LUCENE-3738:
-

Uwe, is this one good to go now? Can we mark it resolved?

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244729#comment-13244729
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Commit patch or not? I have no preference. Perf is sometimes slightly better 
(in microbenchmark on my slow-io system always).

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244738#comment-13244738
 ] 

Robert Muir commented on LUCENE-3738:
-

If there is no real measurable performance regression, can we just commit it to 
trunk?

I really am afraid of last minute .store optimizations.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244743#comment-13244743
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Part of the optimization is already committed to 3.x since 4 days, this is just 
another round of optimization. :-)

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244748#comment-13244748
 ] 

Uwe Schindler commented on LUCENE-3738:
---

I don't care, it's committed to trunk. The new random test was committed 
yesterday. You are the release manager :-)

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-02 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244770#comment-13244770
 ] 

Robert Muir commented on LUCENE-3738:
-

Lets resolve the issue.


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-01 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243704#comment-13243704
 ] 

Robert Muir commented on LUCENE-3738:
-

since there is no performance regression, let's do trunk only.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-04-01 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243729#comment-13243729
 ] 

Uwe Schindler commented on LUCENE-3738:
---

I added a random test for vints and vlogs in TestIndexInput. I wanted to 
especially test the long case, which looked broken in DataOutput (but it was 
correct - but only because of the way how java handles negative values when 
casting to long - I just made it clear what happens).

Robert: The tests last night showed no change, so I have no preference.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-31 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243125#comment-13243125
 ] 

Michael McCandless commented on LUCENE-3738:


Thanks Uwe, I'll test!

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-31 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243138#comment-13243138
 ] 

Michael McCandless commented on LUCENE-3738:


Alas, the results are now all over the place!  And I went back to the prior 
patch and tried to reproduce the above results... and the results are still all 
over the place.  I think we are chasing Java ghosts at this point...

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243143#comment-13243143
 ] 

Uwe Schindler commented on LUCENE-3738:
---

What does your comment mean? Good or bad?

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243265#comment-13243265
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Mike, I was away from home and did not understand your comment, now its clear: 
You cannot reproduce the speedup from last patch neither can you see a 
difference with current patch.

I would suggest that I commit this now to trunk, we test a few nights and then 
commit it to 3.x (Robert needs to backport Ivy to 3.6, so we have some time).

I will commit this later before going to sleep, so we see results tomorrow.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-31 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243293#comment-13243293
 ] 

Michael McCandless commented on LUCENE-3738:


Sorry Uwe, that was exactly it: I don't know what to conclude from the perf 
runs anymore.

But +1 for your new patch: it ought to be better since the code is simpler.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, 
 LUCENE-3738-improvement.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-30 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242550#comment-13242550
 ] 

Michael McCandless commented on LUCENE-3738:


Removing the asserts apparently didn't change the perf...

I can reproduce the slowdown in a separate test (before/after this commit):

{noformat}
TaskQPS base StdDev baseQPS vInt StdDev vInt  Pct 
diff
  IntNRQ7.110.896.730.58  -23% -   
17%
 Prefix3   16.070.96   15.650.72  -12% -
8%
Wildcard   20.140.91   19.670.77  -10% -
6%
PKLookup  154.625.08  151.112.82   -7% -
2%
  Fuzzy1   85.241.53   83.871.18   -4% -
1%
  Fuzzy2   44.111.03   43.960.44   -3% -
3%
SpanNear3.230.113.220.07   -5% -
5%
  TermBGroup1M1P   42.350.49   42.431.43   -4% -
4%
 Respell   65.111.91   65.271.27   -4% -
5%
  AndHighMed   54.184.04   54.502.27  -10% -   
13%
 TermGroup1M   31.270.35   31.460.63   -2% -
3%
TermBGroup1M   45.010.33   45.371.42   -3% -
4%
 AndHighHigh   13.350.71   13.460.50   -7% -   
10%
Term   82.713.12   83.562.33   -5% -
7%
   OrHighMed   10.660.67   10.780.44   -8% -   
12%
  OrHighHigh7.080.427.190.26   -7% -   
11%
SloppyPhrase5.110.245.200.31   -8% -   
13%
  Phrase   11.140.75   11.400.50   -8% -   
14%
{noformat}

But then Uwe made a patch (I'll attach) reducing the byte code for the
unrolled methods:

{noformat}
TaskQPS base StdDev baseQPS vInt StdDev vInt  Pct 
diff
SpanNear3.240.133.180.07   -7% -
4%
  Phrase   11.340.68   11.130.38  -10% -
7%
SloppyPhrase5.170.235.080.18   -9% -
6%
  TermBGroup1M1P   41.920.80   41.570.94   -4% -
3%
 TermGroup1M   30.740.68   30.810.96   -5% -
5%
Term   80.873.52   81.292.05   -6% -
7%
TermBGroup1M   43.940.93   44.171.32   -4% -
5%
  AndHighMed   53.712.62   54.211.97   -7% -
9%
 AndHighHigh   13.200.42   13.410.41   -4% -
8%
 Respell   65.372.70   66.533.29   -7% -   
11%
  Fuzzy1   84.292.11   86.443.36   -3% -
9%
PKLookup  149.814.20  153.879.46   -6% -   
12%
  OrHighHigh7.190.287.400.48   -7% -   
13%
   OrHighMed   10.820.43   11.160.73   -7% -   
14%
  Fuzzy2   43.720.96   45.242.03   -3% -   
10%
Wildcard   18.961.00   20.050.39   -1% -   
13%
 Prefix3   14.960.83   15.890.27   -1% -   
14%
  IntNRQ5.890.586.950.174% -   
34%
{noformat}

So... I think we should commit it!


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please 

[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-30 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242554#comment-13242554
 ] 

Uwe Schindler commented on LUCENE-3738:
---

I an commit that, OK?

We should also do this in 3.x, Robert are you fine? Otherwise this issue is 
only half committed to 3.x :( Its no risk.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-30 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242564#comment-13242564
 ] 

Robert Muir commented on LUCENE-3738:
-

I'm going with your instinct on this one. It would be bad to have a slowdown 
for 3.6,
but I want the negative vlong checks, too.


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-27 Thread Yonik Seeley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239618#comment-13239618
 ] 

Yonik Seeley commented on LUCENE-3738:
--

Regarding unrolling... it hasn't always proved faster in the past, esp wrt vint.

My first try was in 2005: 
http://www.lucidimagination.com/search/document/6d2efedb4dde07d#2a896a9a9adc3f2d
And again in 2006: https://issues.apache.org/jira/browse/LUCENE-639


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-27 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239621#comment-13239621
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Yonik, the unrolling was added because of a recent Java 6 hotspot bug (who 
unrolled the loop itsself - but wrongly). The thing that Mike has seen was a 
strange thing that the already unrolled code (since 3.1) behaves different 
before/after a slight code change done in this issue.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-27 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239628#comment-13239628
 ] 

Robert Muir commented on LUCENE-3738:
-

The original unrolling here was to dodge a JVM bug, possible
in all java versions from ... java6u20 until java6u29 or so?

I don't know if there's another solution other than unrolling
to work around that loop bug.

I don't like the workaround but it does seem realistic at this
point to prevent index corruption since these versions of
java are really recent.


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-27 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239667#comment-13239667
 ] 

Michael McCandless commented on LUCENE-3738:


+1 to remove those asserts... let's see if this fixes the slowdown the nightly 
builds hit on 3/18: http://people.apache.org/~mikemccand/lucenebench/IntNRQ.html



 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-27 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239672#comment-13239672
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Committed the assert removal in revision: trunk 1305909, 3.x 1305911

If this does not help, we can revert again. But the checks are in my opinion 
not really useful and too risky.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: ByteArrayDataInput.java.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-18 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232272#comment-13232272
 ] 

Michael McCandless commented on LUCENE-3738:


bq. In my opinion, we should unroll all readVInt/readVLong loops so all behave 
100% identical! 

+1


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-18 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232336#comment-13232336
 ] 

Michael McCandless commented on LUCENE-3738:


+1

Looks awesome Uwe!


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-18 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232392#comment-13232392
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Committed trunk revision: 1302238

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch, 
 LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-17 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231962#comment-13231962
 ] 

Michael McCandless commented on LUCENE-3738:


bq. The check is only ommitted in the unrolled loop, the for-loop still 
contains the check.

I'm confused... I don't see how/where BufferedIndexInput.readVLong is
checking for negative result now...?  Are you proposing adding an if
into that method?  That's what I don't want to do... eg, readVLong is
called 3 times per term we decode (Lucene40 codec); it's a very low
level API... other codecs may very well call it more often.  I don't
think we should add an if inside BII.readVLong.

Or maybe you are saying you just want the unrolled code to handle
the negative vLong case (ie, unroll the currently missing 10th cycle),
and not add an if to BufferedIndexInput.readVLong?  And then for
free we can add a real if (not assert) if that 10th cycle is hit?
(ie, if we get to that 10th byte, throw an exception).  I think that
makes sense!

bq. there are other asserts in the index readiung code at places completely 
outside any loops, executed only once when index is opened. 

+1 to make those real checks, as long as the cost is vanishingly
small.

bq. which is also a security issue when you e.g. download indexes through 
network connections and a man in the middle modifies the stream.

I don't think it's our job to protect against / detect that.

bq. Disk IO can produce wrong data.

True, but all bets are off if that happens: you're gonna get all sorts
of crazy exceptions out of Lucene.  We are not a filesystem.


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-17 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231978#comment-13231978
 ] 

Uwe Schindler commented on LUCENE-3738:
---

{quote}
bq. The check is only ommitted in the unrolled loop, the for-loop still 
contains the check.

I'm confused... I don't see how/where BufferedIndexInput.readVLong is
checking for negative result now...?
{quote}

I mean that the actual if (b  mask != 0) is also in the original while loop. 
The original while loop then simply proceeds with reading bytes util the 
highest bit is null. The unrolled loop behaves different (and thats a real 
bug), because it will silently not read those remaining bytes, so the file 
pointer is on a different byte after the call. This also affects readVInt!!!

In my opinion, we should unroll *all* readVInt/readVLong loops so all behave 
100% identical! And in the case of the last byte read (where the current assert 
is), throw exception. If we don't unroll all readVInts we have to somehow also 
make the loop exit after too many bytes are read, which would be an costly 
extra check in the loop - thats the reason why I want to unroll all loops to 
fail after 5 or 9 bytes.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231222#comment-13231222
 ] 

Robert Muir commented on LUCENE-3738:
-

LUCENE-3876/LUCENE-3879 reveal more situations where we must write negatives 
with the current encodings,
because we steal bits from things like positions (payloads) and docids too (at 
least in skip data?)

So sometimes its possible these are encoded as negatives.

I think Uwe should commit his fixes?

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231295#comment-13231295
 ] 

Michael McCandless commented on LUCENE-3738:


Hmm... I think we should think about it more.

Ie, we apparently never write a negative vLong today... and I'm not sure we 
should start allowing it...?

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231297#comment-13231297
 ] 

Robert Muir commented on LUCENE-3738:
-

Well that differs with the title of the issue (consistency with vInt).

I don't see how we can avoid negative vints. I think its ok to be inconsistent 
with vLong,
but it should not be something we assert only at read-time. It should be 
asserted on write
so that problems are found immediately.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231309#comment-13231309
 ] 

Michael McCandless commented on LUCENE-3738:


{quote}
 don't see how we can avoid negative vints. I think its ok to be inconsistent 
with vLong,
 but it should not be something we assert only at read-time. It should be 
asserted on write
 so that problems are found immediately.
{quote}

+1

I think we are stuck with negative vInts, as trappy as they are (5 bytes!!).

Let's not make it worse by allowing negative vLongs.  But let's assert that at 
write time (and read time)...

I think inconsistency here is the lesser evil.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231401#comment-13231401
 ] 

Uwe Schindler commented on LUCENE-3738:
---

I just repeat myself:
bq. If we disallow, it should be a hard check (no assert), as the data is 
coming from a file (and somebody could used a hex editor). The reader will 
crash later...

Mike: If you fix the unrolled loops, please also add the checks to the other 
implementations in Buffered* and so on. My original patch fixed those. Please 
include that patch.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231470#comment-13231470
 ] 

Michael McCandless commented on LUCENE-3738:


bq. If we disallow, it should be a hard check (no assert), as the data is 
coming from a file (and somebody could used a hex editor). The reader will 
crash later...

Hmm, I don't think we should do that.

If you go and edit your index with a hex editor... there are no guarantees on 
what may ensue!

bq. Mike: If you fix the unrolled loops, please also add the checks to the 
other implementations in Buffered* and so on. 

I don't think the unrolled loops or other impls of write/readVLong are wrong?  
The javadocs state clearly that negatives are not supported.  All we're doing 
here is added an assert to backup that javadoc statement.



 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231492#comment-13231492
 ] 

Uwe Schindler commented on LUCENE-3738:
---

bq. Hmm, I don't think we should do that.

It costs nothing, as the standard vInt will be read only some 1 or 2 bytes, if 
you really read until the last byte, you have so big vInts that it might even 
be better not to use vInts at all.- And: The not-unrolled loops do the check 
always.

bq. If you go and edit your index with a hex editor... there are no guarantees 
on what may ensue!

Disk IO can produce wrong data. We must check this if we can and it costs 
nothing, which is the case here (see above).

I was already talking with Robert, there are other asserts in the index 
readiung code at places completely outside any loops, executed only once when 
index is opened. Its first priority to do consistency checks of the read bytes. 
Otherwise you can even produce endless loops at some places. - Of course not 
when you have tight loops, but things like checking that the document count is 
in line with e.g. some other value from liveDocs is essential. I will open an 
issue for that, the older Lucene formats are much besster secured, but trunk is 
horrible, just because some people here seem to want to prevent any check, 
which is also a security issue when you e.g. download indexes through network 
connections and a man in the middle modifies the stream. 

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231506#comment-13231506
 ] 

Robert Muir commented on LUCENE-3738:
-

{quote}
Otherwise you can even produce endless loops at some places. - Of course not 
when you have tight loops, but things like checking that the document count is 
in line with e.g. some other value from liveDocs is essential.
{quote}

I agree there a ton of places (essentially all metadata) where we should be 
using real checks (not asserts).

I see Mike's point though: readVLong() is very general, so someone could be 
using it where performance is important.
It just so happens its mostly only used today for metadata type things (except 
maybe terms dictionary stats and a few other places).


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-03-16 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231543#comment-13231543
 ] 

Uwe Schindler commented on LUCENE-3738:
---

You misunderstood my comment:

bq. I see Mike's point though: readVLong() is very general, so someone could be 
using it where performance is important.

The check is only ommitted in the unrolled loop, the for-loop still contains 
the check. In that case it also handles maybe too-long vints correctly, which 
the unrolled code will never do. The unrolled code also has a bug, that it 
handles negative longs wrong, but that should be prevented (in my opinion also 
for ints).

The current assert for both long and int is completely harmless, as it will 
only be executed, if the vInt/vLong has the maximum number of bytes, which is 
very unlikely. And as said before the check is done in the loop-based code, 
too. And comparison in perf showed that the speed of the unrolled loop and the 
standard loop are identical, so about what are you talking?

The good thing here is that we can (in the unrolled loops) harden the check for 
negative vInts, as because of the unrolled loop we have a separate cocde branch 
already, so we can modify the check (which was always done before my unrolling) 
to do the better check.

I had the idea at that times to unroll that loop because of Java bugs, so the 
bug is caused by me and I want to fix it the correct way, definitely without 
loosing anything. Is this so hard to understand?

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196915#comment-13196915
 ] 

Uwe Schindler commented on LUCENE-3738:
---

That was my faulkt wen the Schindler VM instead the Hotspot VM unrolled the 
loop. I unrolled maximum of 9 bytes not 10 (which is wasteful, too).

We had no negative VInts until now, so that was no issue :-)

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196916#comment-13196916
 ] 

Uwe Schindler commented on LUCENE-3738:
---

bq. So, if we can't add that assert in today, I think we should at least fix 
readVLong to handle negative longs... but then you quietly spend 9 bytes (even 
more trappy!).

As the code is written by a loop, we would write 10 bytes, of course the last 
one only with 1 bit. If we wont to spare that and optimize the long case to 
interpret the continuation bit in the last byte different (as part of data), 
the writer must also do that. Ideally we would unroll both loops.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196918#comment-13196918
 ] 

Robert Muir commented on LUCENE-3738:
-

{quote}
As the code is written by a loop, we would write 10 bytes, of course the last 
one only with 1 bit.
{quote}

I think this is ok: otherwise its not a variable-length integer but something 
else 
(with a special case where 9th byte high bit means sign bit instead of 
continuation bit).

Either way we should either fix it to assert value =0 in the writer, or make 
it work.
Ideally we would do that for both vint and vlong, but the problem is some 
things like term vectors
sometimes write negative vints (since it does startOffset - lastEndOffset, if 
you have any synonyms
you get tons of huge 5-byte vints in your term vectors)

But currently its inconsistent: negatives don't trip any assert for either 
vint/vlong at write-time,
but at read-time for vlong *only*.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196919#comment-13196919
 ] 

Robert Muir commented on LUCENE-3738:
-

{quote}
We had no negative VLongs until now, so that was no issue
{quote}

I think we should still avoid this!

I think the javadocs should still say: negatives are unsupported.
Maybe we can fix lucene 4's term vectors format to never write negatives, 
and in version 5 when 3.x indexes no longer need to be read, we can assert = 0 
at write-time?


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196941#comment-13196941
 ] 

Robert Muir commented on LUCENE-3738:
-

{quote}

We should try to never write negative numbers for post 4.0 formats, maybe add 
conditional assert (like for utf8 strings), so preflex can write using negative 
vints and dont trip assert.
{quote}


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196951#comment-13196951
 ] 

Uwe Schindler commented on LUCENE-3738:
---

Just to conclude: The bug with reading negative vLongs affects only DataInputs 
that dont override readVLong. So e.g. BufferedIndexInput is not affected. So 
when you read index from MMap or ByteArrayDataInput or InputStreamDataInput you 
will hit the bug.

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196955#comment-13196955
 ] 

Uwe Schindler commented on LUCENE-3738:
---

The BufferedIndexInput one has also this bug, only affecting reads at the 
boundaries (if the 10 bytes of a full int are no longer in buffer), in that 
case it throws AIOOBE:

{code:java}
public long readVLong() throws IOException {
  if (9 = bufferLength-bufferPosition) {
byte b = buffer[bufferPosition++];
long i = b  0x7F;
for (int shift = 7; (b  0x80) != 0; shift += 7) {
  b = buffer[bufferPosition++];
  i |= (b  0x7FL)  shift;
}
return i;
  } else {
return super.readVLong();
  }
}
{code}

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196968#comment-13196968
 ] 

Michael McCandless commented on LUCENE-3738:


I think we should disallow (assert) writing negative vLong, and, ideally 
disallow negative vInt also, by fixing all the places that rely on this (but, 
carefully... preflexrw will need a backdoor)...

Separately: can't we strengthen the last assert in writeVInt to verify the top 
4 bits are 0, not just the top bit?  We have 36 bits at that point right?  So 
top 4 should be unused (assert (b  0xf0) == 0)

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196972#comment-13196972
 ] 

Uwe Schindler commented on LUCENE-3738:
---

If we disallow, it should be a hard check (no assert), as the data is coming 
from a file (and somebody could used a hex editor). The reader will crash 
later...

 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3738) Be consistent about negative vInt/vLong

2012-01-31 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197383#comment-13197383
 ] 

Robert Muir commented on LUCENE-3738:
-

after investigating: its difficult to prevent negative offsets, even after 
fixing term vectors writer (LUCENE-3739)

At first i tried a simple assert in BaseTokenStreamTestCase:
{code}
assertTrue(offsets must not go backwards, offsetAtt.startOffset() = 
lastStartOffset);
lastStartOffset = offsetAtt.startOffset();
{code}

Then these analyzers failed:
* MockCharFilter itself had a bug, but thats easy to fix (LUCENE-3741)
* synonymsfilter failed sometimes (LUCENE-3742) because it wrote zeros for 
offsets in situations like a - b c
* (edge)ngramtokenizers failed, because ngrams(1,2) of ABCD are not A, AB, B, 
BC, C, CD, D but instead A, B, C, D, AB, BC, CD, ...
* (edge)ngramfilters failed for similar reasons.
* worddelimiterfilter failed, because it doesnt break AB into A, AB, B but 
instead A, B, AB
* trimfilter failed when 'offsets changing' is enabled, because if you have  
rob, robert as synonyms then it trims the first, and the second offsets go 
backwards

These are all bugs.

In general I think offsets after being set should not be changed, because 
filters don't have access to any charfilters
offset correction (correctOffset()) anyway, so they shouldnt be mucking offsets.

So really: only the creator of tokens should make the offsets. And if thats a 
filter, it should be a standard way, 
only inherited from existing offsets and not 'offset mathematics' and not A, 
AB, B in some places and A, B, AB in others.

Really i think we need to step it up if we want highlighting to be first-class 
citizen in lucene, nothing checks the offsets anyhwere at all,
even to check/assert if they are negative, and there are little tests... all we 
have is some newish stuff in basetokenstreamtestcase and
a few trivial test cases.

On the other hand, for example, position increment's impl actually throws 
exception if you give it something like a negative number...


 Be consistent about negative vInt/vLong
 ---

 Key: LUCENE-3738
 URL: https://issues.apache.org/jira/browse/LUCENE-3738
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3738.patch, LUCENE-3738.patch


 Today, write/readVInt allows a negative int, in that it will encode and 
 decode correctly, just horribly inefficiently (5 bytes).
 However, read/writeVLong fails (trips an assert).
 I'd prefer that both vInt/vLong trip an assert if you ever try to write a 
 negative number... it's badly trappy today.  But, unfortunately, we sometimes 
 rely on this... had we had this assert in 'since the beginning' we could have 
 avoided that.
 So, if we can't add that assert in today, I think we should at least fix 
 readVLong to handle negative longs... but then you quietly spend 9 bytes 
 (even more trappy!).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org