Re: Spark SQL driver memory keeps rising

2016-06-16 Thread Khaled Hammouda
esce(200) final_cleaned_data.write.parquet("s3://cleaned_data/today.parquet") On Wed, Jun 15, 2016 at 10:23 PM, Mohammed Guller wrote: > It would be hard to guess what could be going on without looking at the > code. It looks like the driver program goes into a long stop-the-worl

Spark SQL driver memory keeps rising

2016-06-14 Thread Khaled Hammouda
I'm having trouble with a Spark SQL job in which I run a series of SQL transformations on data loaded from HDFS. The first two stages load data from hdfs input without issues, but later stages that require shuffles cause the driver memory to keep rising until it is exhausted, and then the driver s

Re: Is there a limit on the number of tasks in one job?

2016-06-14 Thread Khaled Hammouda
bzadehmich.wordpress.com/> > > > On 13 June 2016 at 20:45, Khaled Hammouda <mailto:khaled.hammo...@kik.com>> wrote: > Hi Michael, > > Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the > preview and tried using it, but I’m running into t

Re: Is there a limit on the number of tasks in one job?

2016-06-13 Thread Khaled Hammouda
Hi Michael, Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the preview and tried using it, but I’m running into the exact same issue. Khaled > On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote: > > You might try with the Spark 2.0 preview. We spent a bunch of time im

Re: Are Spark Streaming RDDs always processed in order?

2015-07-06 Thread Khaled Hammouda
Great! That's what I gathered from the thread titled "Serial batching with Spark Streaming", but thanks for confirming this again. On 6 July 2015 at 15:31, Tathagata Das wrote: > Yes, RDD of batch t+1 will be processed only after RDD of batch t has been > processed. Unless there are errors where