[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033211#comment-14033211 ] Stephan Lachowsky commented on KAFKA-1493: -- Given the way that the decoder works I think that storing the uncompressed size would be the appropriate thing to do. The compressed length can be inferred. This allows the reader of the stream to allocate the minimum required memory for a single-shot decode. I've been looking at how the default blocksize is passed down to the various compression backends, the java and scala code paths look like they do different things. The current java code passes the blocksize into the decoder from the Compressor constructor (Compressor.java:59 and 214). It appears that MemoryRecords is the only user of the java code and it uses the constructor which doesn't explicitly pass a blocksize resulting in fallback to the (tiny) default of 1024. The scala code path in CompressionFactory.scala appears to use just the default constructors for the existing stream wrapper, which means that the compressors will use their own internal default blocksizes. It looks like the scala code has all the messages on heap already. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033215#comment-14033215 ] Stephan Lachowsky commented on KAFKA-1493: -- The lack of checksum in the compressed data is not much of a drawback, IMHO, there is already a CRC32 over the entire message including compressed data. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (KAFKA-1456) Add LZ4 and LZ4C as a compression codec
[ https://issues.apache.org/jira/browse/KAFKA-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016113#comment-14016113 ] Stephan Lachowsky edited comment on KAFKA-1456 at 6/3/14 1:05 AM: -- Hello all, First off, I would really like to see this functionality get added, but I'd like to make sure the wire protocol is done properly, before it is picked by a release and there is no going back. Here are, in my estimation, the issues with the current implementation: - LZ4 and LZ4HC generate the same output format, so they shouldn't have different compression codec enums... this is a producer configuration issue only. The same caveat applies to the compressor level parameter, it is basically a producer CPU/compression tradeoff. - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well documented outside of the java code. I would recommend that something documented be used, like the format defined by the LZ4 author: http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html was (Author: stephanl): Hello all, First off, I would really like to see this functionality get added, but I'd like to make sure the wire protocol is done properly, before it is picked by a release and there is no going back. Here are, in my estimation, the issues with the current implementation: - LZ4 and LZ4HC generate the same output format, so they shouldn't have different compression codec enums... this is a producer configuration issue only. The same caveat applies to the compressor level parameter, it is basically a producer CPU/compression tradeoff. - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well documented outside of the java code. I would recommend that something documented by used, like the format defined by the LZ4 author: http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html Add LZ4 and LZ4C as a compression codec --- Key: KAFKA-1456 URL: https://issues.apache.org/jira/browse/KAFKA-1456 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1456.patch, KAFKA-1456_2014-05-19_15:01:10.patch, KAFKA-1456_2014-05-19_16:39:01.patch, KAFKA-1456_2014-05-19_18:19:32.patch, KAFKA-1456_2014-05-19_23:24:27.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1456) Add LZ4 and LZ4C as a compression codec
[ https://issues.apache.org/jira/browse/KAFKA-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016113#comment-14016113 ] Stephan Lachowsky commented on KAFKA-1456: -- Hello all, First off, I would really like to see this functionality get added, but I'd like to make sure the wire protocol is done properly, before it is picked by a release and there is no going back. Here are, in my estimation, the issues with the current implementation: - LZ4 and LZ4HC generate the same output format, so they shouldn't have different compression codec enums... this is a producer configuration issue only. The same caveat applies to the compressor level parameter, it is basically a producer CPU/compression tradeoff. - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well documented outside of the java code. I would recommend that something documented by used, like the format defined by the LZ4 author: http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html Add LZ4 and LZ4C as a compression codec --- Key: KAFKA-1456 URL: https://issues.apache.org/jira/browse/KAFKA-1456 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1456.patch, KAFKA-1456_2014-05-19_15:01:10.patch, KAFKA-1456_2014-05-19_16:39:01.patch, KAFKA-1456_2014-05-19_18:19:32.patch, KAFKA-1456_2014-05-19_23:24:27.patch -- This message was sent by Atlassian JIRA (v6.2#6252)