Re: Kakfa Failures and Recovery

Jun Rao Tue, 11 Jun 2013 09:30:38 -0700

At LinkedIn, the most common type of failure is controlled shutdown for
code/config pushes. For that, we have a tool for reducing
the unavailability window (
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools). This
can happen once or twice a month. The next common type of failure is
disk/raid failure, which seems to happen once every month or two. The
remaining types of failure include Linux crashes, JMV bugs, and other types
of hardware failures. They happen a few times a year.


Thanks,

Jun


On Tue, Jun 11, 2013 at 1:22 AM, Pankaj Misra <[email protected]>wrote:

> Hi,
>
> We are using 0.8 version of Kafka and are planning for high availability
> testing with replication. While the entire scheme to enable the cluster to
> be highly available is clear, I wanted to get some idea about Kafka Service
> lifetime in terms of Mean-Time to Failure and Time of Recovery in cases of
> failure. Any historic evidences will also help, as it will be vital for us
> to calculate the actual availability of the system across an year.
>
> While I understand that Kafka provides more of active/active mode of
> seamless high availability, but any failure, will impact the performance to
> some extent and this calculation will help in deriving the actual number of
> nodes that we should consider without compromising on the performance as
> well, while the system is available.
>
> Any ideas/facts would be very helpful .
>
> Thanks & Regards
> Pankaj Misra
>
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Re: Kakfa Failures and Recovery

Reply via email to