You can try, consider processing each partition separately if your data is heavily skewed when you partition it.
On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, <[email protected]> wrote: > Thanks Chris > > Going to try it soon by setting maybe spark.sql.shuffle.partitions to > 2001. Also, I was wondering if it would help if I repartition the data by > the fields I am using in group by and window operations? > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <[email protected]> wrote: > >> Hi Ankit, >> >> Without looking at the Spark UI and the stages/DAG, I'm guessing you're >> running on default number of Spark shuffle partitions. >> >> If you're seeing a lot of shuffle spill, you likely have to increase the >> number of shuffle partitions to accommodate the huge shuffle size. >> >> I hope that helps >> Chris >> >> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <[email protected]> >> wrote: >> >>> Nope, it's a batch job. >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <[email protected]> >>> wrote: >>> >>>> Is it a streaming job? >>>> >>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <[email protected]> >>>> wrote: >>>> >>>>> I have a Spark job that consists of a large number of Window >>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of >>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>> instances). I am using the following configurations for the job, although >>>>> I >>>>> have tried various other combinations without any success. >>>>> >>>>> spark.yarn.driver.memoryOverhead 6g >>>>> spark.storage.memoryFraction 0.1 >>>>> spark.executor.cores 6 >>>>> spark.executor.memory 36g >>>>> spark.memory.offHeap.size 8g >>>>> spark.memory.offHeap.enabled true >>>>> spark.executor.instances 10 >>>>> spark.driver.memory 14g >>>>> spark.yarn.executor.memoryOverhead 10g >>>>> >>>>> I keep running into the following OOM error: >>>>> >>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>>>> bytes of memory, got 0 >>>>> at >>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>> at >>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>> at >>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128) >>>>> at >>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163) >>>>> >>>>> I see there are a large number of JIRAs in place for similar issues >>>>> and a great many of them are even marked resolved. >>>>> Can someone guide me as to how to approach this problem? I am using >>>>> Databricks Spark 2.4.1. >>>>> >>>>> Best Regards >>>>> Ankit Khettry >>>>> >>>>
