h the versions against which the Hadoop
> binaries were built.*
>
>
>
>
> https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html#using-the-s3a-filesystem-client
>
>
>
>
>
>
>
>
>
> *From: *Toy
> *Date: *Tuesday, January 23, 2018 at 11:33 AM
> *To
Hi,
First of all, my Spark application runs fine in AWS EMR. However, I'm
trying to run it locally to debug some issue. My application is just to
parse log files and convert to DataFrame then convert to ORC and save to
S3. However, when I run locally I get this error
java.io.IOException: /orc/dt=
Thanks Michael,
Can you give me an example? I'm new to Spark
On Mon, 22 Jan 2018 at 12:25 Michael Mansour
wrote:
> Toy,
>
>
>
> I suggest your partition your data according to date, and use the
> forEachPartition function, using the partition as the bucket location.
>
Hi,
We have a spark application to parse log files and save to S3 in ORC
format. However, during the foreachRDD operation we need to extract a date
field to be able to determine the bucket location; we partition it by date.
Currently, we just hardcode it by current date, but we have a requirement
I'm trying to write a deployment job for Spark application. Basically the
job will send yarn application --kill app_id to the cluster but after the
application received the signal it dies without finishing whatever is
processing or stopping the stream.
I'm using Spark Streaming. What's the best wa
Hi,
Can you point me to the config for that please?
On Wed, 13 Dec 2017 at 14:23 Marcelo Vanzin wrote:
> On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote:
> > I'm wondering why am I seeing 5 attempts for my Spark application? Does
> Spark application restart itself?
>
> I
Hi,
I'm wondering why am I seeing 5 attempts for my Spark application? Does
Spark application restart itself?[image: Screen Shot 2017-12-13 at 2.18.03
PM.png]