[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-06-16 Thread Stephan Lachowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033211#comment-14033211
 ] 

Stephan Lachowsky commented on KAFKA-1493:
--

Given the way that the decoder works I think that storing the uncompressed size 
would be the appropriate thing to do. The compressed length can be inferred.  
This allows the reader of the stream to allocate the minimum required memory 
for a single-shot decode.

I've been looking at how the default blocksize is passed down to the various 
compression backends, the java and scala code paths look like they do different 
things.

The current java code passes the blocksize into the decoder from the Compressor 
constructor (Compressor.java:59 and 214).  It appears that MemoryRecords is the 
only user of the java code and it uses the constructor which doesn't explicitly 
pass a blocksize resulting in fallback to the (tiny) default of 1024.

The scala code path in CompressionFactory.scala appears to use just the default 
constructors for the existing stream wrapper, which means that the compressors 
will use their own internal default blocksizes.  It looks like the scala code 
has all the messages on heap already.

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Reporter: James Oliver
 Fix For: 0.8.2






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-06-16 Thread Stephan Lachowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033215#comment-14033215
 ] 

Stephan Lachowsky commented on KAFKA-1493:
--

The lack of checksum in the compressed data is not much of a drawback, IMHO, 
there is already a CRC32 over the entire message including compressed data.

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Reporter: James Oliver
 Fix For: 0.8.2






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (KAFKA-1456) Add LZ4 and LZ4C as a compression codec

2014-06-02 Thread Stephan Lachowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016113#comment-14016113
 ] 

Stephan Lachowsky edited comment on KAFKA-1456 at 6/3/14 1:05 AM:
--

Hello all,

First off, I would really like to see this functionality get added, but I'd 
like to make sure the wire protocol is done properly, before it is picked by a 
release and there is no going back.

Here are, in my estimation, the issues with the current implementation:
 - LZ4 and LZ4HC generate the same output format, so they shouldn't have 
different compression codec enums... this is a producer configuration issue 
only.  The same caveat applies to the compressor level parameter, it is 
basically a producer CPU/compression tradeoff.
 - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and 
net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well 
documented outside of the java code.  I would recommend that something 
documented be used, like the format defined by the LZ4 author: 
http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html


was (Author: stephanl):
Hello all,

First off, I would really like to see this functionality get added, but I'd 
like to make sure the wire protocol is done properly, before it is picked by a 
release and there is no going back.

Here are, in my estimation, the issues with the current implementation:
 - LZ4 and LZ4HC generate the same output format, so they shouldn't have 
different compression codec enums... this is a producer configuration issue 
only.  The same caveat applies to the compressor level parameter, it is 
basically a producer CPU/compression tradeoff.
 - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and 
net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well 
documented outside of the java code.  I would recommend that something 
documented by used, like the format defined by the LZ4 author: 
http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html

 Add LZ4 and LZ4C as a compression codec
 ---

 Key: KAFKA-1456
 URL: https://issues.apache.org/jira/browse/KAFKA-1456
 Project: Kafka
  Issue Type: Improvement
Reporter: Joe Stein
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1456.patch, KAFKA-1456_2014-05-19_15:01:10.patch, 
 KAFKA-1456_2014-05-19_16:39:01.patch, KAFKA-1456_2014-05-19_18:19:32.patch, 
 KAFKA-1456_2014-05-19_23:24:27.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1456) Add LZ4 and LZ4C as a compression codec

2014-06-02 Thread Stephan Lachowsky (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016113#comment-14016113
 ] 

Stephan Lachowsky commented on KAFKA-1456:
--

Hello all,

First off, I would really like to see this functionality get added, but I'd 
like to make sure the wire protocol is done properly, before it is picked by a 
release and there is no going back.

Here are, in my estimation, the issues with the current implementation:
 - LZ4 and LZ4HC generate the same output format, so they shouldn't have 
different compression codec enums... this is a producer configuration issue 
only.  The same caveat applies to the compressor level parameter, it is 
basically a producer CPU/compression tradeoff.
 - The LZ4Block format used by net.jpountz.lz4.LZ4BlockInputStream and 
net.jpountz.lz4.LZ4BlockOutputStream is a block based format that isn't well 
documented outside of the java code.  I would recommend that something 
documented by used, like the format defined by the LZ4 author: 
http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html

 Add LZ4 and LZ4C as a compression codec
 ---

 Key: KAFKA-1456
 URL: https://issues.apache.org/jira/browse/KAFKA-1456
 Project: Kafka
  Issue Type: Improvement
Reporter: Joe Stein
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1456.patch, KAFKA-1456_2014-05-19_15:01:10.patch, 
 KAFKA-1456_2014-05-19_16:39:01.patch, KAFKA-1456_2014-05-19_18:19:32.patch, 
 KAFKA-1456_2014-05-19_23:24:27.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)