Correct me if I am wrong, do you mean that the kafka log was not processed due to slow topology (in turn bolt) and the log was deleted due to retention period exhaustion ? Thats is still configurable and can be fine tuned. I mean, we can adjust retention interval and/or see why bolt is taking so much time to process this such a log.
I am more interested if some process (say KafkaSpout) dies due to random error. What will happen to messages if any on the wire ? How will it be handled on Kafka cluster side and Topology Bolt side which is reading from kafka spout. On Wed, Jan 20, 2016 at 5:22 AM, John Yost <[email protected]> wrote: > The only data loss I've seen is where a topology with KafkaSpout gets so > far behind that the Kafka log segment for a given partition is rotated. In > such a scenario, you'll see an OffsetOutOfRangeException. > > --John > > On Tue, Jan 19, 2016 at 5:21 PM, Milind Vaidya <[email protected]> wrote: > >> Yes. In a sunny day scenario there is no data loss. But we are trying to >> list some cases where there will be a data loss, or at least we want to >> consider different scenarios in which one or more components fail and see >> how the kafka-storm set up reacts to that and if there is any data loss. >> >> We had some scenarios like you mentioned where the maxOffsetBehind >> setting led to some problems due to down stream slow operations. But we are >> not worried about kafka retention period either, that is a configuration >> issue. What we are looking at is some thread accidentally dying say >> kafka-spout or some kafka host containing all partitions for a topic goes >> down etc. >> >> >> >> On Sat, Jan 16, 2016 at 5:32 AM, Abhishek Agarwal <[email protected]> >> wrote: >> >>> The kafka spout doesn't have a data loss scenario unless you have >>> modified the maxOffsetBehind setting (Long.MAX_VALUE by default) and >>> acks/fails are being done properly. Though data could be lost due to >>> retention being kicked in kafka. The topology will keep retrying a timed >>> out message but kafka is not going to keep it forever. >>> >>> On Fri, Jan 15, 2016 at 12:21 AM, Milind Vaidya <[email protected]> >>> wrote: >>> >>>> Hi >>>> >>>> I have been using kafka-storm setup for more than a year, running >>>> almost 10 different topologies. >>>> >>>> The flow is something like this >>>> >>>> Producer --> Kafka Cluster --> Storm cluster --> MongoDB. >>>> >>>> The zookeeper keeps the metadata. >>>> >>>> So far the approach was little ad hoc and want it to be more >>>> disciplined. We are trying to achieve no data loss and automation in case >>>> of failure handling. >>>> >>>> What are the failure scenarios in case of a storm cluster ? Failure as >>>> in data loss. We will be trying to cover once we know them. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Abhishek Agarwal >>> >>> >> >
