So based on many more runs of this job I've come to the conclusion that a workaround to this error is to
- decrease the amount of data written in each partition, or - increase the amount of memory available to each executor I still don't know what the root cause of the issue is. On Tue, Dec 4, 2018 at 9:45 AM Conrad Lee <con...@parsely.com> wrote: > Yeah, probably increasing the memory or increasing the number of output > partitions would help. However increasing memory available to each > executor would add expense. I want to keep the number of partitions low so > that each parquet file turns out to be around 128 mb, which is best > practice for long-term storage and use with other systems like presto. > > This feels like a bug due to the flakey nature of the failure -- also, > usually when the memory gets too low the executor is killed or errors out > and I get one of the typical Spark OOM error codes. When I run the same > job with the same resources sometimes this job succeeds, and sometimes it > fails. > > On Mon, Dec 3, 2018 at 5:19 PM Christopher Petrino < > christopher.petr...@gmail.com> wrote: > >> Depending on the size of your data set and how how many resources you >> have (num-executors, executor instances, number of nodes) I'm inclined to >> suspect that issue is related to reduction of partitions from thousands to >> 96; I could be misguided but given the details I have I would consider >> testing an approach to understand the behavior if the final stage operates >> at different number of partitions. >> >> On Mon, Dec 3, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> wrote: >> >>> Thanks for the thoughts. While the beginning of the job deals with lots >>> of files in the first stage, they're first coalesced down into just a few >>> thousand partitions. The part of the job that's failing is the reduce-side >>> of a dataframe.sort() that writes output to HDFS. This last stage has only >>> 96 tasks and the partitions are well balanced. I'm not using a >>> `partitionBy` option on the dataframe writer. >>> >>> On Fri, Nov 30, 2018 at 8:14 PM Christopher Petrino < >>> christopher.petr...@gmail.com> wrote: >>> >>>> The reason I ask is because I've had some unreliability caused by over >>>> stressing the HDFS. Do you know the number of partitions when these actions >>>> are being. i.e. if you have 1,000,000 files being read you may have >>>> 1,000,000 partitions which may cause HDFS stress. Alternatively if you have >>>> 1 large file, say 100 GB, you may 1 partition which would not fit in memory >>>> and may cause writes to disk. I imagine it may be flaky because you are >>>> doing some action like a groupBy somewhere and depending on how the data >>>> was read certain groups will be in certain partitions; I'm not sure if >>>> reads on files are deterministic, I suspect they are not >>>> >>>> On Fri, Nov 30, 2018 at 2:08 PM Conrad Lee <con...@parsely.com> wrote: >>>> >>>>> I'm loading the data using the dataframe reader from parquet files >>>>> stored on local HDFS. The stage of the job that fails is not the stage >>>>> that does this. The stage of the job that fails is one that reads a >>>>> sorted >>>>> dataframe from the last shuffle and performs the final write to parquet on >>>>> local HDFS. >>>>> >>>>> On Fri, Nov 30, 2018 at 4:02 PM Christopher Petrino < >>>>> christopher.petr...@gmail.com> wrote: >>>>> >>>>>> How are you loading the data? >>>>>> >>>>>> On Fri, Nov 30, 2018 at 2:26 AM Conrad Lee <con...@parsely.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the suggestions. Here's an update that responds to some >>>>>>> of the suggestions/ideas in-line: >>>>>>> >>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it resolved >>>>>>>> my issues. >>>>>>> >>>>>>> >>>>>>> I tried EMR 5.17.0 and the problem still sometimes occurs. >>>>>>> >>>>>>> try running a coalesce. Your data may have grown and is defaulting >>>>>>>> to a number of partitions that causing unnecessary overhead >>>>>>>> >>>>>>> Well I don't think it's that because this problem occurs flakily. >>>>>>> That is, if the job hangs I can kill it and re-run it and it works fine >>>>>>> (on >>>>>>> the same hardware and with the same memory settings). I'm not getting >>>>>>> any >>>>>>> OOM errors. >>>>>>> >>>>>>> On a related note: the job is spilling to disk. I see messages like >>>>>>> this: >>>>>>> >>>>>>> 18/11/29 21:40:06 INFO UnsafeExternalSorter: Thread 156 spilling >>>>>>>> sort data of 912.0 MB to disk (3 times so far) >>>>>>> >>>>>>> >>>>>>> This occurs in both successful and unsuccessful runs though. I've >>>>>>> checked the disks of an executor that's running a hanging job and its >>>>>>> disks >>>>>>> have plenty of space, so it doesn't seem to be an out of disk space >>>>>>> issue. >>>>>>> This also doesn't seem to be where it hangs--the logs move on and >>>>>>> describe >>>>>>> the the parquet commit. >>>>>>> >>>>>>> On Thu, Nov 29, 2018 at 4:06 PM Christopher Petrino < >>>>>>> christopher.petr...@gmail.com> wrote: >>>>>>> >>>>>>>> If not, try running a coalesce. Your data may have grown and is >>>>>>>> defaulting to a number of partitions that causing unnecessary overhead >>>>>>>> >>>>>>>> On Thu, Nov 29, 2018 at 3:02 AM Conrad Lee <con...@parsely.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks, I'll try using 5.17.0. >>>>>>>>> >>>>>>>>> For anyone trying to debug this problem in the future: In other >>>>>>>>> jobs that hang in the same manner, the thread dump didn't have any >>>>>>>>> blocked >>>>>>>>> threads, so that might be a red herring. >>>>>>>>> >>>>>>>>> On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino < >>>>>>>>> christopher.petr...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I ran into problems using 5.19 so I referred to 5.17 and it >>>>>>>>>> resolved my issues. >>>>>>>>>> >>>>>>>>>> On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee <con...@parsely.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello Vadim, >>>>>>>>>>> >>>>>>>>>>> Interesting. I've only been running this job at scale for a >>>>>>>>>>> couple weeks so I can't say whether this is related to recent EMR >>>>>>>>>>> changes. >>>>>>>>>>> >>>>>>>>>>> Much of the EMR-specific code for spark has to do with writing >>>>>>>>>>> files to s3. In this case I'm writing files to the cluster's HDFS >>>>>>>>>>> though >>>>>>>>>>> so my sense is that this is a spark issue, not an EMR (but I'm not >>>>>>>>>>> sure). >>>>>>>>>>> >>>>>>>>>>> Conrad >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 27, 2018 at 5:21 PM Vadim Semenov < >>>>>>>>>>> va...@datadoghq.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Conrad, >>>>>>>>>>>> >>>>>>>>>>>> has it started happening recently? >>>>>>>>>>>> >>>>>>>>>>>> We recently started having some sporadic problems with drivers >>>>>>>>>>>> on EMR >>>>>>>>>>>> when it gets stuck, up until two weeks ago everything was fine. >>>>>>>>>>>> We're trying to figure out with the EMR team where the issue is >>>>>>>>>>>> coming from. >>>>>>>>>>>> On Tue, Nov 27, 2018 at 6:29 AM Conrad Lee <con...@parsely.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > Dear spark community, >>>>>>>>>>>> > >>>>>>>>>>>> > I'm running spark 2.3.2 on EMR 5.19.0. I've got a job that's >>>>>>>>>>>> hanging in the final stage--the job usually works, but I see this >>>>>>>>>>>> hanging >>>>>>>>>>>> behavior in about one out of 50 runs. >>>>>>>>>>>> > >>>>>>>>>>>> > The second-to-last stage sorts the dataframe, and the final >>>>>>>>>>>> stage writes the dataframe to HDFS. >>>>>>>>>>>> > >>>>>>>>>>>> > Here you can see the executor logs, which indicate that it >>>>>>>>>>>> has finished processing the task. >>>>>>>>>>>> > >>>>>>>>>>>> > Here you can see the thread dump from the executor that's >>>>>>>>>>>> hanging. Here's the text of the blocked thread. >>>>>>>>>>>> > >>>>>>>>>>>> > I tried to work around this problem by enabling speculation, >>>>>>>>>>>> but speculative execution never takes place. I don't know why. >>>>>>>>>>>> > >>>>>>>>>>>> > Can anyone here help me? >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks, >>>>>>>>>>>> > Conrad >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>> >>>>>>>>>>>