[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549533#comment-13549533
 ] 

Robert Muir commented on LUCENE-4674:
-

I dont really agree (i dont think this class should be treated like 
stringbuffer).

changing offset to 0 is fine when we make a new array: otherwise it is 
definitely and 100% certainly NOT OK as we may overwrite unrelated data.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549534#comment-13549534
 ] 

Robert Muir commented on LUCENE-4674:
-

moreover, any proposed changes here should also include the changes to IntsRef, 
LongsRef, CharsRef, and so on before even being considered.

Otherwise the apis just get out of wack.

Maybe we should just seriously consider just switching to java.nio.Buffer.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549536#comment-13549536
 ] 

Uwe Schindler commented on LUCENE-4674:
---

I agree with Robert. We had BytesRef and CharsRef in the past doing that stuff. 
But as the name of the class is *Ref not *Buffer, it should only hold a 
reference to a byte[] and not change it or grow it. Esspecially it should not 
change offset. This is risky, if you get a BytesRef that points to some slice 
in a larger buffer and you suddenly resize it, invalidating content that might 
be needed by other stuff (e.g. while iterating terms, the previous/next term 
gets corrupted).

I would in any case favour to use ByteBuffer instead of this unsafe and 
inncomplete duplicate. BytesRef is for user-facing APIs a mess.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549538#comment-13549538
 ] 

Adrien Grand commented on LUCENE-4674:
--

I still find confusing that we are allowed to write past offset + length but 
not before offset.

Switching to the java.nio buffers sounds good.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549542#comment-13549542
 ] 

Shai Erera commented on LUCENE-4674:


I recently (LUCENE-4620) moved some facets code to use BytesRef and IntsRef and 
found these two classes very convenient. The only thing that I found missing is 
a *Ref.upto. E.g., I first made the mistake {{for (int i = bytes.offset; i  
bytes.length; i++)}}, where the correct form is {{for (int i = bytes.offset; i 
 bytes.length + bytes.offset; i++)}} (but then you need to do that '+' at 
every iteration, or extract it to a variable).

I considered using BytesBuffer instead, but as long as e.g. a Payload is 
represented as a BytesRef, it's a waste to always 
ByteBuffer.wrap(BytesRef.bytes, offset, length). I used BytesRef as it was very 
convenient (and if we add an 'upto' index to them, that'd even be greater :)).

I agree that grow() is currently risky, as it may override some data that is 
used by another thread (as a slice to the buffer).  But that can be solved with 
proper documentation I think.

I also agree that we should not set offset to 0. I did that, and MemoryCodec 
got upset :). For all practical purposes, apps should treat offset and length 
as final (we should not make them final though, just document it). If an app 
messes with them, it should better know what it's doing.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549543#comment-13549543
 ] 

Robert Muir commented on LUCENE-4674:
-

the whole class is confusing.

but the problem with this proposed change is very simple:
BytesRef a = new BytesRef(bigbyte, 0, 5);
BytesRef b = new BytesRef(bigbyte, 5, 10);

b.copy(someOtherStuff...) should *NOT* muck with a. 

A is unrelated to B. 

I think realistically we should avoid methods like append/copy alltogether as 
they encourage more
stringbuffer-type use like this.

if you want a stringbuffer-type class, it can safely support methods like this, 
but then it should
*own the array* (make a copy). 


 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549544#comment-13549544
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. b.copy(someOtherStuff...) should NOT muck with a.

Unfortunately a.copy(otherStuff) will modify b if otherStuff.length  5.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549554#comment-13549554
 ] 

Robert Muir commented on LUCENE-4674:
-

I will open a new issue to remove all write methods from bytesref.

this is a ref class, not a stringbuilder. we have to keep these apis contained.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549596#comment-13549596
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
Unfortunately a.copy(otherStuff) will modify b if otherStuff.length  5.
{quote}

I still like the idea of fixing this myself (maybe Shai's idea?). i don't like 
this kind of dangerous stuff!!

I ultimately think LUCENE-4675 is the next logical step, but can we remove this 
a.copy()-overwrites-b trap as an incremental improvement?

thats a bug in my opinion.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549625#comment-13549625
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. I still like the idea of fixing this myself (maybe Shai's idea?). i don't 
like this kind of dangerous stuff!!

The 'upto' idea or allocating a new byte[] if someOtherStuff offset + length  
this.offset + length? ?

bq. I ultimately think LUCENE-4675 is the next logical step, but can we remove 
this a.copy()-overwrites-b trap as an incremental improvement?

Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549630#comment-13549630
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
allocating a new byte[] if someOtherStuff offset + length  this.offset + 
length? ?
{quote}

This, preventing a.copy(otherStuff) from overflowing onto b.

I dont want any other functionality in this class. it needs less, not more.

{quote}
Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).
{quote}

I haven't even thought about it really. I actually am less concerned about our 
internal apis. 

Its the public ones i care about.

I would care a lot less about BytesRef  co if users werent forced to interact 
with them.

 Consistently set offset=0 in BytesRef.copyBytes
 ---

 Key: LUCENE-4674
 URL: https://issues.apache.org/jira/browse/LUCENE-4674
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-4674.patch


 BytesRef.copyBytes(BytesRef other) has two branches:
  - either the destination array is large enough and it will copy bytes after 
 offset,
  - or it needs to resize and in that case it will set offset = 0.
 I think this method should always set offset = 0 for consistency, and to 
 avoid resizing when other.length is larger than this.bytes.length - 
 this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org