Can anyone help me answer the question? Yuanyuan
From: Yuanyuan Tian/Almaden/IBM@IBMUS To: [email protected] Date: 03/15/2013 02:05 PM Subject: about fault tolerance in Giraph Hi I was testing the fault tolerance of Giraph on a long running job. I noticed that when one of the worker throw an exception, the whole job failed without retrying the task, even though I turned on the checkpointing and there were available map slots in my cluster. Why wasn't the fault tolerance mechanism working? I was running a version of Giraph downloaded sometime in June 2012 and I used Netty for the communication layer. Thanks, Yuanyuan
