Here's a self-contained testcase. It works when I take out the line with
the np.asarray.
---
import sys
import numpy as np
from pyspark import SparkContext
def parse_sotw(line):
tokens = line.split('\t')
return {'trial': int(tokens[0]),
'F': np.asarray(map(float, tokens[1:]))}
sc = SparkContext(sys.argv[2], "Risk MonteCarlo")
# Load data
sotw_raw = sc.textFile(sys.argv[1])
sotw = sotw_raw.map(parse_sotw)
print sotw.collect()
---
Here's a sample input line:
1,-0.360479,0.151966,-0.924236,0.716188,-0.359643,1.214568,1.816294,1.205595,0.71207
---
thanks a ton for your help,
Sandy
On Thu, Dec 19, 2013 at 3:09 PM, Jey Kottalam <[email protected]> wrote:
> Hi Sandy,
>
> Do you have a self-contained testcase that we could use to reproduce the
> error?
>
> -Jey
>
> On Thu, Dec 19, 2013 at 3:04 PM, Sandy Ryza <[email protected]>
> wrote:
> > Verified that python is installed on the worker. When I simplify my job
> I'm
> > able to to get more stuff in stderr, but it's just the Java log4j
> messages.
> >
> > I narrowed it down and I'm pretty sure the error is coming from my use of
> > numpy - I'm trying to pass around records that hold numpy arrays. I've
> > verified that numpy is installed on the workers and that the job works
> > locally on the master. Is there anything else I need to do for accessing
> > numpy from workers?
> >
> > thanks,
> > Sandy
> >
> >
> >
> > On Thu, Dec 19, 2013 at 2:23 PM, Matei Zaharia <[email protected]>
> > wrote:
> >>
> >> It might also mean you don’t have Python installed on the worker.
> >>
> >> On Dec 19, 2013, at 1:17 PM, Jey Kottalam <[email protected]> wrote:
> >>
> >> > That's pretty unusual; normally the executor's stderr output would
> >> > contain a stacktrace and any other error messages from your Python
> >> > code. Is it possible that the PySpark worker crashed in C code or was
> >> > OOM killed?
> >> >
> >> > On Thu, Dec 19, 2013 at 11:10 AM, Sandy Ryza <[email protected]
> >
> >> > wrote:
> >> >> Hey All,
> >> >>
> >> >> Where are python logs in PySpark supposed to go? My job is getting a
> >> >> org.apache.spark.SparkException: Python worker exited unexpectedly
> >> >> (crashed)
> >> >> but when I look at the stdout/stderr logs in the web UI, nothing
> >> >> interesting
> >> >> shows up (stdout is empty and stderr just has the spark executor
> >> >> command).
> >> >>
> >> >> Is this the expected behavior?
> >> >>
> >> >> thanks in advance for any guidance,
> >> >> Sandy
> >>
> >
>