Hi everyone,
We are deploying kafka cluster for ingesting streaming data. But sometimes,
some of nodes on the cluster have troubles (node dies, kafka daemon is
killed...). However, Recovering data in Kafka can be very slow. It takes
serveral hours to recover from disaster. I saw a slide here suggesting
using multiple data centers (
https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka).
But I wonder, how can we detect the problem and switch between datacenters
in Spark Streaming? Since kafka 0.10.1 support timestamp index, how can
seek to right offsets?
Are there any opensource library out there that supports handling the
problem on the fly?
Thanks.

Reply via email to