So based on many more runs of this job I've come to the conclusion that a
workaround to this error is to
- decrease the amount of data written in each partition, or
- increase the amount of memory available to each executor
I still don't know what the root cause of the issue is.
On Tue, De
Yeah, probably increasing the memory or increasing the number of output
partitions would help. However increasing memory available to each
executor would add expense. I want to keep the number of partitions low so
that each parquet file turns out to be around 128 mb, which is best
practice for lo
If not, try running a coalesce. Your data may have grown and is defaulting
to a number of partitions that causing unnecessary overhead
On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee wrote:
> Thanks, I'll try using 5.17.0.
>
> For anyone trying to debug this problem in the future: In other jobs that
Thanks, I'll try using 5.17.0.
For anyone trying to debug this problem in the future: In other jobs that
hang in the same manner, the thread dump didn't have any blocked threads,
so that might be a red herring.
On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino <
christopher.petr...@gmail.com> w
I ran into problems using 5.19 so I referred to 5.17 and it resolved my
issues.
On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee wrote:
> Hello Vadim,
>
> Interesting. I've only been running this job at scale for a couple weeks
> so I can't say whether this is related to recent EMR changes.
>
> Much
Hello Vadim,
Interesting. I've only been running this job at scale for a couple weeks
so I can't say whether this is related to recent EMR changes.
Much of the EMR-specific code for spark has to do with writing files to
s3. In this case I'm writing files to the cluster's HDFS though so my
sense
Hey Conrad,
has it started happening recently?
We recently started having some sporadic problems with drivers on EMR
when it gets stuck, up until two weeks ago everything was fine.
We're trying to figure out with the EMR team where the issue is coming from.
On Tue, Nov 27, 2018 at 6:29 AM Conrad
Dear spark community,
I'm running spark 2.3.2 on EMR 5.19.0. I've got a job that's hanging in
the final stage--the job usually works, but I see this hanging behavior in
about one out of 50 runs.
The second-to-last stage sorts the dataframe, and the final stage writes
the dataframe to HDFS.
Here