Re: Data loss scenarios

Milind Vaidya Tue, 19 Jan 2016 14:23:05 -0800

Yes. In a sunny day scenario there is no data loss. But we are trying to
list some cases where there will be a data loss, or at least we want to
consider different scenarios in which one or more components fail and see
how the kafka-storm set up reacts to that and if there is any data loss.


We had some scenarios like you mentioned where the maxOffsetBehind setting
led to some problems due to down stream slow operations. But we are not
worried about kafka retention period either, that is a configuration issue.
What we are looking at is some thread accidentally dying say kafka-spout or
some kafka host containing all partitions for a topic goes down etc.



On Sat, Jan 16, 2016 at 5:32 AM, Abhishek Agarwal <[email protected]>
wrote:

> The kafka spout doesn't have a data loss scenario unless you have modified
> the maxOffsetBehind setting (Long.MAX_VALUE by default) and acks/fails are
> being done properly. Though data could be lost due to retention being
> kicked in kafka. The topology will keep retrying a timed out message but
> kafka is not going to keep it forever.
>
> On Fri, Jan 15, 2016 at 12:21 AM, Milind Vaidya <[email protected]> wrote:
>
>> Hi
>>
>> I have been using kafka-storm setup for more than a year, running almost
>> 10 different topologies.
>>
>> The flow is something like this
>>
>> Producer --> Kafka Cluster --> Storm cluster --> MongoDB.
>>
>> The zookeeper keeps the metadata.
>>
>> So far the approach was little ad hoc and  want it to be more
>> disciplined. We are trying to achieve no data loss and automation in case
>> of failure handling.
>>
>> What are the failure scenarios in case of a storm cluster ? Failure as in
>> data loss. We will be trying to cover once we know them.
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>

Re: Data loss scenarios

Reply via email to