[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119703#comment-15119703 ] Dana Powers commented on KAFKA-1493: Hi all - it appears that the header checksum (HC) byte is incorrect. The Kafka implementation hashes the magic bytes + header, but the spec is to only hash header (don't include magic). We are having some trouble encoding/decoding from non-java clients because the framing must be munged before reading / writing to kafka. Is this known? I don't see another JIRA for it. Should I file separately or should this be reopened? > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > -- > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: James Oliver >Assignee: James Oliver >Priority: Blocker > Fix For: 0.8.2.0 > > Attachments: KAFKA-1493.patch, KAFKA-1493.patch, > KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119920#comment-15119920 ] Magnus Edenhill commented on KAFKA-1493: [~dana.powers] I can confirm this is the case, as you describe it. I suggest creating a new issue for this. I have a patch that adds a new compression.type=lz4f with proper framing. > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > -- > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: James Oliver >Assignee: James Oliver >Priority: Blocker > Fix For: 0.8.2.0 > > Attachments: KAFKA-1493.patch, KAFKA-1493.patch, > KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120072#comment-15120072 ] Dana Powers commented on KAFKA-1493: filed KAFKA-3160 > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > -- > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: James Oliver >Assignee: James Oliver >Priority: Blocker > Fix For: 0.8.2.0 > > Attachments: KAFKA-1493.patch, KAFKA-1493.patch, > KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174217#comment-14174217 ] James Oliver commented on KAFKA-1493: - Updated reviewboard https://reviews.apache.org/r/26658/diff/ against branch origin/trunk Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174305#comment-14174305 ] Jun Rao commented on KAFKA-1493: James, Thanks for the patch. There are a few things marked as todo in the patch. Are those required? Do you think you have time to finish the patch for 0.8.2? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174356#comment-14174356 ] James Oliver commented on KAFKA-1493: - Jun, My pleasure. The TODOs are parts of the specification that are unimplemented, but are not required. I left them in there as hints if/when the spec is contributed back to lz4-java. The validation routines will disallow the use of any portion of the spec that is unimplemented, but it's totally usable. What the spec can do - compress decompress messages using 64kb/256kb/1mb/4mb blockSize (64kb by default) with optional block checksums (disabled by default) What the spec cannot do - decompress messages compressed by an implementation supporting some of the missing features. If this were to occur, a RuntimeException with detailed information will be thrown. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174473#comment-14174473 ] Jun Rao commented on KAFKA-1493: James, Thanks for the answer. We can leave the TODOs there. The patch looks good to me. Could you look at the comments in the RB? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174716#comment-14174716 ] James Oliver commented on KAFKA-1493: - Updated reviewboard https://reviews.apache.org/r/26658/diff/ against branch origin/trunk Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171082#comment-14171082 ] James Oliver commented on KAFKA-1493: - Sorry to not be more clear - I fixed a few spots related to the removal of the LZ4HC option, but left the I/O streams in Ivan's patch alone. Since I didn't have permissions to update Ivan's reviewboard, I created a new review. 1. This looks like Ivan's interpretation of the lz4-java block stream format. 2. We should use neither - the lz4-java impl was used previously (KAFKA-1456). Review by the community produced this issue. We need a real implementation of http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171732#comment-14171732 ] Jun Rao commented on KAFKA-1493: James, Thanks, got it now. Not sure how long it will take to get a real implementation of http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html. Should we just take out LZ4 in CompressionType and CompressionCodec in 0.8.2 so that people don't use it until it's fixed? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171748#comment-14171748 ] James Oliver commented on KAFKA-1493: - I implemented the OutputStream today. If I can't get the InputStream done and tested before I leave for vacation Thursday, IMO we should take it out. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169616#comment-14169616 ] James Oliver commented on KAFKA-1493: - Sure, I'll take a look at it now. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170071#comment-14170071 ] Jun Rao commented on KAFKA-1493: James, Thanks for the patch. A couple of more questions. 1. The following header frame used in the patch doesn't seem to match exactly what's described in http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html. So, we are inventing our own header? Is that ok? /* * Message format: * HEADER which consists of: * 1) magic byte sequence (8 bytes) * 2) compression method token (1 byte) * 3) compressed length (4 bytes) * 4) original message length (4 bytes) * and compressed message itself * Block size: 64 Kb * */ 2. If the io stream code in this patch is identical to that in lz4-java, could we just use lz4-java instead? Thanks, Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167153#comment-14167153 ] Jun Rao commented on KAFKA-1493: James, Could you help review the format in Ivan's patch? Is the format used in KafkaLZ4BlockInputStream standard? I am wondering if there are libraries in other languages that support this format too. Thanks, Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165275#comment-14165275 ] Ivan Lyutov commented on KAFKA-1493: Created reviewboard https://reviews.apache.org/r/26503/diff/ against branch apache/trunk Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158291#comment-14158291 ] Jun Rao commented on KAFKA-1493: The easiest thing is probably to just take out LZ4 in CompressionType and CompressionCodec in 0.8.2. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158408#comment-14158408 ] Theo Hultberg commented on KAFKA-1493: -- If you're looking for a standard way to handle LZ4 there doesn't seem to be any, but Cassandra uses a 4 byte field for the uncompressed length and no checksum (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/compress/LZ4Compressor.java). I've seen varint used too in other projects, but in my opinion it's a pain to implement compared to just using an int, and for very little benefit. The drawbacks are that small messages will use one or two bytes more, and that you can't handle compressed chunks of over a couple of gigabyte. Sorry for jumping into the discussion out of the blue, I just stumbled upon this while looking through the issues for 0.8.2. I've got very little experience with the Kafka codebase, but I'm the author of the Ruby driver for Cassandra and I recognized the issue. Hope this was helpful and I didn't completely miss the point. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142132#comment-14142132 ] Jun Rao commented on KAFKA-1493: Since this is a blocker for 0.8.2, if we can't get this fixed in the next few days, I suggest that we just remove the documentation in producerConfig about the LZ4 and leave LZ4 an unsupported compression codec for now. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131758#comment-14131758 ] James Oliver commented on KAFKA-1493: - I have today to work on this, I will see how far I can get. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122037#comment-14122037 ] Guozhang Wang commented on KAFKA-1493: -- Could we still have that for 0.8.2? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032832#comment-14032832 ] James Oliver commented on KAFKA-1493: - Snappy's block (default size 32kb) compression format is this: snappy codec header: 8-byte magic header, version [4-byte integer], min compatible version [4-byte integer] compressed block 1: compressed data size [4-byte integer], compressed data compressed block 2 ... Notable limitations: no checksum If I understand the proposed format correctly, this is what you're suggesting: uncompressed data size [n-byte varint], compressed data While I would expect compressing an entire message as a single block would provide a better compression ratio than compressing smaller chunks, doing so for larger messages is going to cause serious performance problems. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033211#comment-14033211 ] Stephan Lachowsky commented on KAFKA-1493: -- Given the way that the decoder works I think that storing the uncompressed size would be the appropriate thing to do. The compressed length can be inferred. This allows the reader of the stream to allocate the minimum required memory for a single-shot decode. I've been looking at how the default blocksize is passed down to the various compression backends, the java and scala code paths look like they do different things. The current java code passes the blocksize into the decoder from the Compressor constructor (Compressor.java:59 and 214). It appears that MemoryRecords is the only user of the java code and it uses the constructor which doesn't explicitly pass a blocksize resulting in fallback to the (tiny) default of 1024. The scala code path in CompressionFactory.scala appears to use just the default constructors for the existing stream wrapper, which means that the compressors will use their own internal default blocksizes. It looks like the scala code has all the messages on heap already. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033215#comment-14033215 ] Stephan Lachowsky commented on KAFKA-1493: -- The lack of checksum in the compressed data is not much of a drawback, IMHO, there is already a CRC32 over the entire message including compressed data. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033274#comment-14033274 ] James Oliver commented on KAFKA-1493: - I agree that storing the uncompressed length as a varint makes logical sense for allocating the required heap space IFF the entire uncompressed message is destined for the heap. Otherwise, this strategy introduces unnecessary heap requirements. I also agree that the checksum doesn't buy us much... IMO LZ4 is mature enough to not worry about distortion, and as you mentioned we already checksum the compressed message to verify accurate transmission. Looks like the LZ4 Java path doesn't pass that default blockSize to the underlying stream, which should be changed (if we go with the LZ4Block streams). That being said, the ultra-small block size is robbing performance...we should consider bumping it up to something in the 32-64kb range to improve our compression ratio and reduce block overhead. We could just compress the entire message as [~alb...@stonethree.com] mentioned and document the heap requirements, but it doesn't look like any of the other compression codecs do so and I'm hesitant to change the way LZ4 would work... partially implementing https://docs.google.com/document/d/1gZbUoLw5hRzJ5Q71oPRN6TO4cRMTZur60qip-TE7BhQ/edit?pli=1 might still be our best option. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030908#comment-14030908 ] Albert Strasheim commented on KAFKA-1493: - What does the format look like for a Snappy compressed message? One might simply need a varint-encoded field for the uncompressed length followed by a compressed block. The LZ4 streaming format and the xxhash, etc. in there might be overkill. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Reporter: James Oliver Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.2#6252)