[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964258#comment-13964258 ] Uwe Schindler commented on LUCENE-5583: --- Thanks! Static thread locals are horrible for the garbage collector, so I would really don't use them. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand >Priority: Blocker > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5583.patch > > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964245#comment-13964245 ] Adrien Grand commented on LUCENE-5583: -- I didn't think much about the thread-local, I just thought it would be nice to minimize the number of buffer instances. I will make it work like copyBytes for consistency. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand >Priority: Blocker > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5583.patch > > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964233#comment-13964233 ] Uwe Schindler commented on LUCENE-5583: --- An alternative would be to use a non-static buffer (like in {{IndexOutput#copyBytes()}}). Because a single IndexInput or Indexoutput cannot be used by more than one thread without cloning, there is no need for a thread local. It is just a problem, if its static for all instances. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5583.patch > > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964227#comment-13964227 ] Uwe Schindler commented on LUCENE-5583: --- Do we really need the skipBuffer thread local? I would remove this and instantiate the skipbuffer with correct size, max 1024. We had some similar code in copyBytes() of DataOutput/DataInput and removed that because it is stupid with newer JVMs. JVMs are very good for those short-lived objects, no need to cache them. They are also quite small. In any case we should only allocate as much bytes we really need ({{new byte[Math.min(toSkip, 1024)]}}. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > Attachments: LUCENE-5583.patch > > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964155#comment-13964155 ] Adrien Grand commented on LUCENE-5583: -- Thanks for the feedback, I'll work on a patch. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964154#comment-13964154 ] Uwe Schindler commented on LUCENE-5583: --- +1 on having SkipBytes. One other thing: Currently ChecksumIndexInput throws UnsupportedEx if you try to seek, but in a subclass we suddenly allow it again (only forward). Maybe we should move the code up to ChecksumIndexinput and document that seeking only works forward and may be costly (because it has to read)? In that case we implement that with skipBytes(), too. This would allow to use ChecksumIndexinput also for other codec parts while merging. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964131#comment-13964131 ] Adrien Grand commented on LUCENE-5583: -- bq. I personally think you shouldn't pass this shared buffer to readBytes() it can break all delegates. We already have code that does it for stored fields (the piece of code I pasted in the issue description comes from {{CompressingStoredFieldsReader.skipBytes}}`). But indeed, this is the other option: either this way of skipping over bytes using a write-only buffer is considered wrong, or {{DataInput}} implementations should never read in the user-provided buffer. +1 on having {{skipBytes}} on {{DataInput}} although I would rather like the default impl to do bulk reads instead of reading bytes one by one. > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5583) Should BufferedChecksumIndexInput have its own buffer?
[ https://issues.apache.org/jira/browse/LUCENE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964082#comment-13964082 ] Simon Willnauer commented on LUCENE-5583: - I personally think you shouldn't pass this shared buffer to readBytes() it can break all delegates. I wonder if we want to add a skipBytes method to DataInput that we can impl. efficienly on the lower levels and that just calls readByte() in a loop as a default impl? > Should BufferedChecksumIndexInput have its own buffer? > -- > > Key: LUCENE-5583 > URL: https://issues.apache.org/jira/browse/LUCENE-5583 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 4.8 >Reporter: Adrien Grand > > I was playing with on-the-fly checksum verification and this made me stumble > upon an issue with {{BufferedChecksumIndexInput}}. > I have some code that skips over a {{DataInput}} by reading bytes into > /dev/null, eg. > {code} > private static final byte[] SKIP_BUFFER = new byte[1024]; > private static void skipBytes(DataInput in, long numBytes) throws > IOException { > assert numBytes >= 0; > for (long skipped = 0; skipped < numBytes; ) { > final int toRead = (int) Math.min(numBytes - skipped, > SKIP_BUFFER.length); > in.readBytes(SKIP_BUFFER, 0, toRead); > skipped += toRead; > } > } > {code} > It is fine to read into this static buffer, even from multiple threads, since > the content that is read doesn't matter here. However, it breaks with > {{BufferedChecksumIndexInput}} because of the way that it updates the > checksum: > {code} > @Override > public void readBytes(byte[] b, int offset, int len) > throws IOException { > main.readBytes(b, offset, len); > digest.update(b, offset, len); > } > {code} > If you are unlucky enough so that a concurrent call to {{skipBytes}} started > modifying the content of {{b}} before the call to {{digest.update(b, offset, > len)}} finished, then your checksum will be wrong. > I think we should make {{BufferedChecksumIndexInput}} read into a private > buffer first instead of relying on the user-provided buffer. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org