Are you using snappy compression?  There was a bug with snappy that caused 
corrupt messages.

Sent from my iPhone

> On Mar 29, 2016, at 8:15 AM, sunil kalva <kalva.ka...@gmail.com> wrote:
> 
> Hi
> Do we store message crc also on disk, and server verifies same when we are
> reading messages back from disk?
> And how to handle errors when we use async publish ?
> 
>> On Fri, Mar 25, 2016 at 4:17 AM, Becket Qin <becket....@gmail.com> wrote:
>> 
>> You mentioned that you saw few corrupted messages, (< 0.1%). If so are you
>> able to see some corrupted messages if you produce, say, 10M messages?
>> 
>> On Wed, Mar 23, 2016 at 9:40 PM, sunil kalva <kalva.ka...@gmail.com>
>> wrote:
>> 
>>> I am using java client and kafka 0.8.2, since events are corrupted in
>>> kafka broker i cant read and replay them again.
>>> 
>>>> On Thu, Mar 24, 2016 at 9:42 AM, Becket Qin <becket....@gmail.com>
>>> wrote:
>>> 
>>>> Hi Sunil,
>>>> 
>>>> The messages in Kafka has a CRC stored with each of them. When consumer
>>>> receives a message, it will compute the CRC from the message bytes and
>>>> compare it to the stored CRC. If the computed CRC and stored CRC does
>> not
>>>> match, that indicates the message has corrupted. I am not sure in your
>>> case
>>>> why the message is corrupted. Corrupted message seems to  be pretty
>> rare
>>>> because the broker actually validate the CRC before it stores the
>>> messages
>>>> on to the disk.
>>>> 
>>>> Is this problem reproduceable? If so, can you find out the messages
>> that
>>>> are corrupted? Also, are you using the Java clients or some other
>>> clients?
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On Wed, Mar 23, 2016 at 8:28 PM, sunil kalva <kalva.ka...@gmail.com>
>>>> wrote:
>>>> 
>>>>> can some one help me out here.
>>>>> 
>>>>> On Wed, Mar 23, 2016 at 7:36 PM, sunil kalva <kalva.ka...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi
>>>>>> I am seeing few messages getting corrupted in kafka, It is not
>>>> happening
>>>>>> frequently and percentage is also very very less (less than 0.1%).
>>>>>> 
>>>>>> Basically i am publishing thrift events in byte array format to
>> kafka
>>>>>> topics(with out encoding like base64), and i also see more events
>>> than
>>>> i
>>>>>> publish (i confirm this by looking at the offset for that topic).
>>>>>> For example if i publish 100 events and i see 110 as offset for
>> that
>>>>> topic
>>>>>> (since it is in production i could not get exact messages which
>>> causing
>>>>>> this problem, and we will only realize this problem when we consume
>>>>> because
>>>>>> our thrift deserialization fails).
>>>>>> 
>>>>>> So my question is, is there any magic byte which actually
>> determines
>>>> the
>>>>>> boundary of the message which is same as the byte i am sending or
>> or
>>>> for
>>>>>> any n/w issues messages get chopped and stores as one message to
>>>> multiple
>>>>>> messages on server side ?
>>>>>> 
>>>>>> tx
>>>>>> SunilKalva
>> 

Reply via email to