esce(200)
final_cleaned_data.write.parquet("s3://cleaned_data/today.parquet")
On Wed, Jun 15, 2016 at 10:23 PM, Mohammed Guller
wrote:
> It would be hard to guess what could be going on without looking at the
> code. It looks like the driver program goes into a long stop-the-worl
I'm having trouble with a Spark SQL job in which I run a series of SQL
transformations on data loaded from HDFS.
The first two stages load data from hdfs input without issues, but later
stages that require shuffles cause the driver memory to keep rising until
it is exhausted, and then the driver s
bzadehmich.wordpress.com/>
>
>
> On 13 June 2016 at 20:45, Khaled Hammouda <mailto:khaled.hammo...@kik.com>> wrote:
> Hi Michael,
>
> Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the
> preview and tried using it, but I’m running into t
Hi Michael,
Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the
preview and tried using it, but I’m running into the exact same issue.
Khaled
> On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote:
>
> You might try with the Spark 2.0 preview. We spent a bunch of time im
Great! That's what I gathered from the thread titled "Serial batching with
Spark Streaming", but thanks for confirming this again.
On 6 July 2015 at 15:31, Tathagata Das wrote:
> Yes, RDD of batch t+1 will be processed only after RDD of batch t has been
> processed. Unless there are errors where