Hi,
I've got a strange problem with 0.8.1 (we're going to make the jump
to 0.9.0 in a few days, but for now I'm woring with a 0.8.1 cluster)
: After a few iteration of my method, one random node of my local
cluster throws an exception like that :
This sounds bad, and probably related to shuffle file consolidation.
Turning off consolidation would probably get you working again, but I'd
really love to track down the bug. Do you know if any tasks fail before
those errors start occurring? It's very possible that another exception is
occurring
Hi,
There are indeed errors preceding this. I've missed them at first
because the FileNotFound was the last before hanging/crashing, and
previous errors seemed not to be blocking :
ERROR SendingConnection: Exception while reading SendingConnection