Re: PySpark + executor lost

2014-08-08 Thread Avishek Saha
output On 8 August 2014 13:28, Avishek Saha wrote: > You mean YARN cluster, right? > > Also, my jobs runs thru all their stages just fine. But the entire > code crashes when I do a "saveAsTextFile". > > On 8 August 2014 13:24, Sandy Ryza wrote: >> Hi Avishek, >

Re: PySpark + executor lost

2014-08-08 Thread Avishek Saha
gt; > On Fri, Aug 8, 2014 at 12:47 PM, Avishek Saha > wrote: >> >> So I think I have a better idea of the problem now. >> >> The environment is YARN client and IIRC PySpark doesn't run on YARN >> cluster. >> >> So my client is heavily loa

Re: Lost executors

2014-08-08 Thread Avishek Saha
Same here Ravi. See my post on a similar thread. Are you running on YARN client? On Aug 7, 2014 2:56 PM, "rpandya" wrote: > I'm running into a problem with executors failing, and it's not clear > what's > causing it. Any suggestions on how to diagnose & fix it would be > appreciated. > > There a

Re: PySpark + executor lost

2014-08-08 Thread Avishek Saha
sters mode? On Aug 7, 2014 3:04 PM, "Davies Liu" wrote: > What is the environment ? YARN or Mesos or Standalone? > > It will be more helpful if you could show more loggings. > > On Wed, Aug 6, 2014 at 7:25 PM, Avishek Saha > wrote: > > Hi, > > > > I g

PySpark + executor lost

2014-08-06 Thread Avishek Saha
Hi, I get a lot of executor lost error for "saveAsTextFile" with PySpark and Hadoop 2.4. For small datasets this error occurs but since the dataset is small it gets eventually written to the file. For large datasets, it takes forever to write the final output. Any help is appreciated. Avishek -

Re: numpy + pyspark

2014-06-27 Thread Avishek Saha
e cluster be viable? >> The dependencies would get tricky but I think this is the sort of situation >> it's built for. >> >> >> On 6/27/14, 11:06 AM, Avishek Saha wrote: >> >> I too felt the same Nick but I don't have root privileges on the cluste

Re: numpy + pyspark

2014-06-27 Thread Avishek Saha
etc. I'd say by the time you figure out > correctly deploying numpy in this manner, you may as well have just built > it into your cluster bootstrap process, or PSSH install it on each node... > > > On Fri, Jun 27, 2014 at 4:58 PM, Avishek Saha > wrote: > >> To cla

Re: numpy + pyspark

2014-06-27 Thread Avishek Saha
To clarify I tried it and it almost worked -- but I am getting some problems from the Random module in numpy. If anyone has successfully passed a numpy module (via the --py-files option) to spark-submit then please let me know. Thanks !! Avishek On 26 June 2014 17:45, Avishek Saha wrote: >

numpy + pyspark

2014-06-26 Thread Avishek Saha
Hi all, Instead of installing numpy in each worker node, is it possible to ship numpy (via --py-files option maybe) while invoking the spark-submit? Thanks, Avishek