> Wow. I get tons of them in the logs.. And there arent that many clients that > got killed as reported by the MR job.. Is that the only case when these > errors are reported?
What about speculative execution? Or RPC timeouts (do you log that)? > Ok good, so one of the two happened then.. I will try figuring out what > happened.. ZK servers are not collocated in my setup.. They are a set of 5 > dedicated nodes (nothing else running).. Then I'm betting the farm it's the nodes that have a resource problem. > No, I was referring to using TOF on a regular java-API MR job.. But I guessed > using TOF will be similar to what I am currently doing.. Unless you have something weird to do with the HTable, using the TOF is good practice on the map output. > > I will use ganglia to monitor the stats.. Please! On a cluster of that size it's almost mandatory :) J-D