[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964258#comment-13964258
 ] 

Uwe Schindler commented on LUCENE-5583:
---

Thanks! Static thread locals are horrible for the garbage collector, so I would 
really don't use them.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>Priority: Blocker
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5583.patch
>
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964245#comment-13964245
 ] 

Adrien Grand commented on LUCENE-5583:
--

I didn't think much about the thread-local, I just thought it would be nice to 
minimize the number of buffer instances. I will make it work like copyBytes for 
consistency.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>Priority: Blocker
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5583.patch
>
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964233#comment-13964233
 ] 

Uwe Schindler commented on LUCENE-5583:
---

An alternative would be to use a non-static buffer (like in 
{{IndexOutput#copyBytes()}}). Because a single IndexInput or Indexoutput cannot 
be used by more than one thread without cloning, there is no need for a thread 
local. It is just a problem, if its static for all instances.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5583.patch
>
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964227#comment-13964227
 ] 

Uwe Schindler commented on LUCENE-5583:
---

Do we really need the skipBuffer thread local? I would remove this and 
instantiate the skipbuffer with correct size, max 1024. We had some similar 
code in copyBytes() of DataOutput/DataInput and removed that because it is 
stupid with newer JVMs.

JVMs are very good for those short-lived objects, no need to cache them. They 
are also quite small. In any case we should only allocate as much bytes we 
really need ({{new byte[Math.min(toSkip, 1024)]}}.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
> Attachments: LUCENE-5583.patch
>
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964155#comment-13964155
 ] 

Adrien Grand commented on LUCENE-5583:
--

Thanks for the feedback, I'll work on a patch.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964154#comment-13964154
 ] 

Uwe Schindler commented on LUCENE-5583:
---

+1 on having SkipBytes.

One other thing: Currently ChecksumIndexInput throws UnsupportedEx if you try 
to seek, but in a subclass we suddenly allow it again (only forward). Maybe we 
should move the code up to ChecksumIndexinput and document that seeking only 
works forward and may be costly (because it has to read)? In that case we 
implement that with skipBytes(), too. This would allow to use 
ChecksumIndexinput also for other codec parts while merging.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964131#comment-13964131
 ] 

Adrien Grand commented on LUCENE-5583:
--

bq. I personally think you shouldn't pass this shared buffer to readBytes() it 
can break all delegates.

We already have code that does it for stored fields (the piece of code I pasted 
in the issue description comes from 
{{CompressingStoredFieldsReader.skipBytes}}`). But indeed, this is the other 
option: either this way of skipping over bytes using a write-only buffer is 
considered wrong, or {{DataInput}} implementations should never read in the 
user-provided buffer.

+1 on having {{skipBytes}} on {{DataInput}} although I would rather like the 
default impl to do bulk reads instead of reading bytes one by one.

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?

2014-04-09 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964082#comment-13964082
 ] 

Simon Willnauer commented on LUCENE-5583:
-

I personally think you shouldn't pass this shared buffer to readBytes() it can 
break all delegates. I wonder if we want to add a skipBytes method to DataInput 
that we can impl. efficienly on the lower levels and that just calls readByte() 
in a loop as a default impl?

> Should BufferedChecksumIndexInput have its own buffer?
> --
>
> Key: LUCENE-5583
> URL: https://issues.apache.org/jira/browse/LUCENE-5583
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.8
>Reporter: Adrien Grand
>
> I was playing with on-the-fly checksum verification and this made me stumble 
> upon an issue with {{BufferedChecksumIndexInput}}.
> I have some code that skips over a {{DataInput}} by reading bytes into 
> /dev/null, eg.
> {code}
>   private static final byte[] SKIP_BUFFER = new byte[1024];
>   private static void skipBytes(DataInput in, long numBytes) throws 
> IOException {
> assert numBytes >= 0;
> for (long skipped = 0; skipped < numBytes; ) {
>   final int toRead = (int) Math.min(numBytes - skipped, 
> SKIP_BUFFER.length);
>   in.readBytes(SKIP_BUFFER, 0, toRead);
>   skipped += toRead;
> }
>   }
> {code}
> It is fine to read into this static buffer, even from multiple threads, since 
> the content that is read doesn't matter here. However, it breaks with 
> {{BufferedChecksumIndexInput}} because of the way that it updates the 
> checksum:
> {code}
>   @Override
>   public void readBytes(byte[] b, int offset, int len)
> throws IOException {
> main.readBytes(b, offset, len);
> digest.update(b, offset, len);
>   }
> {code}
> If you are unlucky enough so that a concurrent call to {{skipBytes}} started 
> modifying the content of {{b}} before the call to {{digest.update(b, offset, 
> len)}} finished, then your checksum will be wrong.
> I think we should make {{BufferedChecksumIndexInput}} read into a private 
> buffer first instead of relying on the user-provided buffer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org