Shuffle file not found Exception

2014-02-09 Thread Guillaume Pitel
Hi, I've got a strange problem with 0.8.1 (we're going to make the jump to 0.9.0 in a few days, but for now I'm woring with a 0.8.1 cluster) : After a few iteration of my method, one random node of my local cluster throws an exception like that :

Re: Shuffle file not found Exception

2014-02-09 Thread Aaron Davidson
This sounds bad, and probably related to shuffle file consolidation. Turning off consolidation would probably get you working again, but I'd really love to track down the bug. Do you know if any tasks fail before those errors start occurring? It's very possible that another exception is occurring

Re: Shuffle file not found Exception

2014-02-09 Thread Guillaume Pitel
Hi, There are indeed errors preceding this. I've missed them at first because the FileNotFound was the last before hanging/crashing, and previous errors seemed not to be blocking : ERROR SendingConnection: Exception while reading SendingConnection