We could definitely improve PySpark's failure reporting mechanisms. Right now, the worker has a try-catch block that forwards Python exceptions to Java, but there are still a few failures that can occur after the worker starts up and before we enter that block that may go unreported in the worker's own logs (see https://github.com/apache/incubator-spark/blob/master/python/pyspark/worker.py). For example, I think you might see problems if the UDF or broadcast variables can't be deserialized properly.
We should move more of the worker's code into the try block. It would also be helpful to redirect the Python subprocesses' stderr and stdout to log to a file. On Fri, Dec 20, 2013 at 11:50 AM, Sandy Ryza <[email protected]>wrote: > Yeah, only using numpy. Strange, it must be an issue with my setup. Will > let you know if I figure it out. > > -Sandy > > > On Fri, Dec 20, 2013 at 6:03 AM, Michael Ronquest <[email protected]>wrote: > >> Sandy, >> Are you just using numpy? numexpr (fast math for numpy arrays) >> has issues on workers. >> Cheers, >> Mike >> >> On 12/19/2013 06:04 PM, Sandy Ryza wrote: >> >>> Verified that python is installed on the worker. When I simplify my job >>> I'm able to to get more stuff in stderr, but it's just the Java log4j >>> messages. >>> >>> I narrowed it down and I'm pretty sure the error is coming from my use >>> of numpy - I'm trying to pass around records that hold numpy arrays. I've >>> verified that numpy is installed on the workers and that the job works >>> locally on the master. Is there anything else I need to do for accessing >>> numpy from workers? >>> >>> thanks, >>> Sandy >>> >>> >>> >>> On Thu, Dec 19, 2013 at 2:23 PM, Matei Zaharia >>> <[email protected]<mailto: >>> [email protected]>> wrote: >>> >>> It might also mean you don’t have Python installed on the worker. >>> >>> On Dec 19, 2013, at 1:17 PM, Jey Kottalam <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> > That's pretty unusual; normally the executor's stderr output would >>> > contain a stacktrace and any other error messages from your Python >>> > code. Is it possible that the PySpark worker crashed in C code >>> or was >>> > OOM killed? >>> > >>> > On Thu, Dec 19, 2013 at 11:10 AM, Sandy Ryza >>> <[email protected] <mailto:[email protected]>> wrote: >>> >> Hey All, >>> >> >>> >> Where are python logs in PySpark supposed to go? My job is >>> getting a >>> >> org.apache.spark.SparkException: Python worker exited >>> unexpectedly (crashed) >>> >> but when I look at the stdout/stderr logs in the web UI, >>> nothing interesting >>> >> shows up (stdout is empty and stderr just has the spark >>> executor command). >>> >> >>> >> Is this the expected behavior? >>> >> >>> >> thanks in advance for any guidance, >>> >> Sandy >>> >>> >>> >> >
