Roger found another possible issue with snappy compression that during
broker bouncing the snappy compressed messages could get corrupted while
re-sending. I am not sure if it is related but would be good to verify
after the upgrade.

Guozhang

On Tue, May 12, 2015 at 3:55 PM, Jun Rao <j...@confluent.io> wrote:

> Hi, Andrew,
>
> Thanks for finding this out. I marked KAFKA-2189 as a blocker for 0.8.3.
> Could you share your experience on snappy 1.1.1.7 in the jira once you have
> tried it out? If the result looks good, we can upgrade the snappy version
> in trunk.
>
> Jun
>
> On Tue, May 12, 2015 at 1:23 PM, Olson,Andrew <aols...@cerner.com> wrote:
>
> > Hi Jun,
> >
> > I figured it out this morning and opened
> > https://issues.apache.org/jira/browse/KAFKA-2189 --
> > it turned out to be a bug in versions 1.1.1.2 through 1.1.1.6 of
> > snappy-java that has recently
> > been fixed (I was very happy to see their new unit test named
> > "batchingOfWritesShouldNotAffectCompressedDataSize"). We will be patching
> > 1.1.1.7 out to our
> > clusters as soon as we can.
> >
> > Regarding the mirror maker question, we wrote our own custom replication
> > code and are not
> > using the mirror maker to copy the data between clusters. We¹re still
> > using the old java
> > producer, and confirmed the issue was present with both the 0.8.1.1 and
> > 0.8.2.1 old producer
> > client.
> >
> > thanks,
> > Andrew
> >
> > On 5/12/15, 3:08 PM, "Jun Rao" <j...@confluent.io> wrote:
> >
> > >Andrew,
> > >
> > >The recompression logic didn't change in 0.8.2.1. The broker still takes
> > >all messages in a single request, assigns offsets and recompresses them
> > >into a single compressed message.
> > >
> > >Are you using mirror maker to copy data from the 0.8.1 cluster to the
> > >0.8.2
> > >cluster? If so, this may have to do with the batching in the producer in
> > >mirror maker. Did you enable the new java producer in mirror maker?
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >
> > >On Mon, May 11, 2015 at 12:53 PM, Olson,Andrew <aols...@cerner.com>
> > wrote:
> > >
> > >> After a recent 0.8.2.1 upgrade we noticed a significant increase in
> used
> > >> filesystem space for our Kafka log data. We have another Kafka cluster
> > >> still on 0.8.1.1 whose Kafka data is being copied over to the upgraded
> > >> cluster, and it is clear that the disk consumption is higher on
> 0.8.2.1
> > >>for
> > >> the same message data. The log retention config for the two clusters
> is
> > >>the
> > >> same also.
> > >>
> > >> We ran some tests to figure out what was happening, and it appears
> that
> > >>in
> > >> 0.8.2.1 the Kafka brokers re-compress each message individually (we¹re
> > >> using Snappy), while in 0.8.1.1 they applied the compression across an
> > >> entire batch of messages written to the log. For producers sending
> large
> > >> batches of small similar messages, the difference can be quite
> > >>substantial
> > >> (in our case, it looks like a little over 2x). Is this a bug, or the
> > >> expected new behavior?
> > >>
> > >> thanks,
> > >> Andrew
> > >>
> > >> CONFIDENTIALITY NOTICE This message and any included attachments are
> > >>from
> > >> Cerner Corporation and are intended only for the addressee. The
> > >>information
> > >> contained in this message is confidential and may constitute inside or
> > >> non-public information under international, federal, or state
> securities
> > >> laws. Unauthorized forwarding, printing, copying, distribution, or use
> > >>of
> > >> such information is strictly prohibited and may be unlawful. If you
> are
> > >>not
> > >> the addressee, please promptly delete this message and notify the
> > >>sender of
> > >> the delivery error by e-mail or you may call Cerner's corporate
> offices
> > >>in
> > >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
> > >>
> >
> >
>



-- 
-- Guozhang

Reply via email to