Re: Failed to run runJob at ReceiverTracker.scala

2014-08-29 Thread Tim Smith
I upped the ulimit to 128k files on all nodes. Job crashed again with DAGScheduler: Failed to run runJob at ReceiverTracker.scala:275. Couldn't get the logs because I killed the job and looks like yarn wipe the container logs (not sure why it wipes the logs under /var/log/hadoop-yarn/container).

Re: Failed to run runJob at ReceiverTracker.scala

2014-08-28 Thread Tathagata Das
Do you see this error right in the beginning or after running for sometime? The root cause seems to be that somehow your Spark executors got killed, which killed receivers and caused further errors. Please try to take a look at the executor logs of the lost executor to find what is the root cause

Re: Failed to run runJob at ReceiverTracker.scala

2014-08-28 Thread Tim Smith
Appeared after running for a while. I re-ran the job and this time, it crashed with: 14/08/29 00:18:50 WARN ReceiverTracker: Error reported by receiver for stream 0: Error in block pushing thread - java.net.SocketException: Too many open files Shouldn't the failed receiver get re-spawned on a

Re: Failed to run runJob at ReceiverTracker.scala

2014-08-28 Thread Tathagata Das
It did. It got failed and respawned 4 times. In this case, the too many open files is a sign that you need increase the system-wide limit of open files. Try adding ulimit -n 16000 to your conf/spark-env.sh. TD On Thu, Aug 28, 2014 at 5:29 PM, Tim Smith secs...@gmail.com wrote: Appeared after