Hi Cody,
Thanks for your reply.
Is there a way in Spark-Kafka-Direct API, so that if an exception to write to
Cassandra occurs, we stop updating the checkpoint ?
In this way, there will be no message lost, once cassandra comes up, we can
start reading from the point we left off.
Regards,
Sam
From: Cody Koeninger [mailto:c...@koeninger.org]
Sent: Thursday, September 10, 2015 1:13 AM
To: Samya MAITI
Cc: user@spark.apache.org
Subject: Re: Spark streaming -> cassandra : Fault Tolerance
It's been a while since I've looked at the cassandra connector, so I can't give
you specific advice on it.
But in general, if a spark task fails (uncaught exception), it will be retried
automatically. In the case of the kafka direct stream rdd, it will have
exactly the same messages as the first attempt (as long as they're still in the
kafka log).
If you or the cassandra connector are catching the exception, the task won't be
retried automatically and it's up to you to deal with it.
On Wed, Sep 9, 2015 at 2:09 PM, Samya
mailto:samya.ma...@amadeus.com>> wrote:
Hi Team,
I have an sample spark application which reads from Kafka using direct API &
then does some transformation & stores to cassandra (using
saveToCassandra()).
If Cassandra goes down, then application logs NoHostAvailable exception (as
expected). But in the mean time the new incoming messages are lost, as the
Direct API creates new checkpoint & deletes the previous one's.
Does that mean, I should handle the exception at application side?
Or is there any other hook to handle the same?
Thanks in advance.
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-cassandra-Fault-Tolerance-tp24625.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail:
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail:
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>