I will try to catch the exception in my spark driver. Last time I try spark standalone cluster there were some problem with akka and DNS. I will try again. I am sure there tasks failing four times. I didn't find a way to throttle the Cassandra load. But I think I cannot know the load at any time on the Cassandra cluster. Catching the exception and retry is the best solution I can think of.
-- 发自我的网易邮箱平板适配版 在 2014-08-08 23:07:06,"Martin Weindel" <[email protected]> 写道: Normally the Spark driver stops by throwing an exception if the same task fails four times. Can you please check in the log if your have these four failures for the same task? I recommend to first take a closer look what's going on in Spark and the Cassandra driver before you dive into the scheduling with Mesos. Or you can run your job using Spark standalone cluster to verify if this problem is really related to Mesos. BTW, your real problem seems to be overloading the Cassandra cluster, so I would think about how to throttle the read load your job is producing. On Fri, Aug 8, 2014 at 9:19 AM, Xu Zhongxing <[email protected]> wrote: I am using spark + mesos to read data from cassandra. After some time, cassandra driver throws exception due to high load on the cluster. And Mesos UI shows many task failures. The Spark driver hangs there. I would like the spark driver to exit so that I could know that the job fails. Why the spark driver program does not exit when mesos stops after many failures? The detailed log is at https://github.com/datastax/spark-cassandra-connector/issues/134

