Re: I can't save DataFrame from running Spark locally

2018-01-23 Thread Toy
h the versions against which the Hadoop > binaries were built.* > > > > > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html#using-the-s3a-filesystem-client > > > > > > > > > > *From: *Toy > *Date: *Tuesday, January 23, 2018 at 11:33 AM > *To

I can't save DataFrame from running Spark locally

2018-01-23 Thread Toy
Hi, First of all, my Spark application runs fine in AWS EMR. However, I'm trying to run it locally to debug some issue. My application is just to parse log files and convert to DataFrame then convert to ORC and save to S3. However, when I run locally I get this error java.io.IOException: /orc/dt=

Re: [EXT] How do I extract a value in foreachRDD operation

2018-01-22 Thread Toy
Thanks Michael, Can you give me an example? I'm new to Spark On Mon, 22 Jan 2018 at 12:25 Michael Mansour wrote: > Toy, > > > > I suggest your partition your data according to date, and use the > forEachPartition function, using the partition as the bucket location. >

How do I extract a value in foreachRDD operation

2018-01-22 Thread Toy
Hi, We have a spark application to parse log files and save to S3 in ORC format. However, during the foreachRDD operation we need to extract a date field to be able to determine the bucket location; we partition it by date. Currently, we just hardcode it by current date, but we have a requirement

How to do stop streaming before the application got killed

2017-12-22 Thread Toy
I'm trying to write a deployment job for Spark application. Basically the job will send yarn application --kill app_id to the cluster but after the application received the signal it dies without finishing whatever is processing or stopping the stream. I'm using Spark Streaming. What's the best wa

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Toy
Hi, Can you point me to the config for that please? On Wed, 13 Dec 2017 at 14:23 Marcelo Vanzin wrote: > On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote: > > I'm wondering why am I seeing 5 attempts for my Spark application? Does > Spark application restart itself? > > I

Why do I see five attempts on my Spark application

2017-12-13 Thread Toy
Hi, I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark application restart itself?[image: Screen Shot 2017-12-13 at 2.18.03 PM.png]