> Wow. I get tons of them in the logs.. And there arent that many clients that 
> got killed as reported by the MR job.. Is that the only case when these 
> errors are reported?

What about speculative execution? Or RPC timeouts (do you log that)?

> Ok good, so one of the two happened then.. I will try figuring out what 
> happened.. ZK servers are not collocated in my setup.. They are a set of 5 
> dedicated nodes (nothing else running)..

Then I'm betting the farm it's the nodes that have a resource problem.

> No, I was referring to using TOF on a regular java-API MR job.. But I guessed 
> using TOF will be similar to what I am currently doing..

Unless you have something weird to do with the HTable, using the TOF
is good practice on the map output.

>
> I will use ganglia to monitor the stats..

Please! On a cluster of that size it's almost mandatory :)

J-D

Reply via email to