Re: Using third party libraries in pyspark

2015-01-22 Thread Felix C
Python couldn't find your module. Do you have that on each worker node? You will need to have that on each one --- Original Message --- From: "Davies Liu" Sent: January 22, 2015 9:12 PM To: "Mohit Singh" Cc: user@spark.apache.org Subject: Re: Using third party libra

Re: Using third party libraries in pyspark

2015-01-22 Thread Davies Liu
You need to install these libraries on all the slaves, or submit via spark-submit: spark-submit --py-files xxx On Thu, Jan 22, 2015 at 11:23 AM, Mohit Singh wrote: > Hi, > I might be asking something very trivial, but whats the recommend way of > using third party libraries. > I am using tabl

Using third party libraries in pyspark

2015-01-22 Thread Mohit Singh
Hi, I might be asking something very trivial, but whats the recommend way of using third party libraries. I am using tables to read hdf5 format file.. And here is the error trace: print rdd.take(2) File "/tmp/spark/python/pyspark/rdd.py", line , in take res = self.context.runJob(s