It says you have 3811 tasks in earlier stages and you're going down to 2001
partitions, that would make it more memory intensive. I'm guessing the
default spark shuffle partition was 200 so that would have failed. Go for
higher number, maybe even higher than 3811. What was your shuffle write
from stage 7 and shuffle read from stage 8?

On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, <justankit2...@gmail.com> wrote:

> Still unable to overcome the error. Attaching some screenshots for
> reference.
> Following are the configs used:
> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
> spark.executor.cores 6 spark.executor.memory 36g
> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true spark.executor.instances 10
> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>
> Best Regards
> Ankit Khettry
>
> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh <chris.t...@gmail.com> wrote:
>
>> You can try, consider processing each partition separately if your data
>> is heavily skewed when you partition it.
>>
>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, <justankit2...@gmail.com>
>> wrote:
>>
>>> Thanks Chris
>>>
>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>> the fields I am using in group by and window operations?
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <chris.t...@gmail.com> wrote:
>>>
>>>> Hi Ankit,
>>>>
>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
>>>> running on default number of Spark shuffle partitions.
>>>>
>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>> the number of shuffle partitions to accommodate the huge shuffle size.
>>>>
>>>> I hope that helps
>>>> Chris
>>>>
>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Nope, it's a batch job.
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Is it a streaming job?
>>>>>>
>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have a Spark job that consists of a large number of Window
>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of
>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>>> instances). I am using the following configurations for the job, 
>>>>>>> although I
>>>>>>> have tried various other combinations without any success.
>>>>>>>
>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>> spark.executor.cores 6
>>>>>>> spark.executor.memory 36g
>>>>>>> spark.memory.offHeap.size 8g
>>>>>>> spark.memory.offHeap.enabled true
>>>>>>> spark.executor.instances 10
>>>>>>> spark.driver.memory 14g
>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>
>>>>>>> I keep running into the following OOM error:
>>>>>>>
>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>>> 16384 bytes of memory, got 0
>>>>>>> at
>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>> at
>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>> at
>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128)
>>>>>>> at
>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163)
>>>>>>>
>>>>>>> I see there are a large number of JIRAs in place for similar issues
>>>>>>> and a great many of them are even marked resolved.
>>>>>>> Can someone guide me as to how to approach this problem? I am using
>>>>>>> Databricks Spark 2.4.1.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Ankit Khettry
>>>>>>>
>>>>>>

Reply via email to