Re: How to avoid Spark shuffle spill memory?

David Mitchell Tue, 06 Oct 2015 20:19:55 -0700

Hi unk1102,

Try adding more memory to your nodes.  Are you running Spark in the cloud?
If so, increase the memory on your servers.
Do you have default parallelism set (spark.default.parallelism)?  If so,
unset it, and let Spark decided how many partitions to allocate.
You can also try refactoring your code to make is use less memory.


David

On Tue, Oct 6, 2015 at 3:19 PM, unk1102 <umesh.ka...@gmail.com> wrote:

> Hi I have a Spark job which runs for around 4 hours and it shared
> SparkContext and runs many child jobs. When I see each job in UI I see
> shuffle spill of around 30 to 40 GB and because of that many times
> executors
> gets lost because of using physical memory beyond limits how do I avoid
> shuffle spill? I have tried almost all optimisations nothing is helping I
> dont cache anything I am using Spark 1.4.1 and also using tungsten,codegen
> etc  I am using spark.shuffle.storage as 0.2 and spark.storage.memory as
> 0.2
> I tried to increase shuffle memory to 0.6 but then it halts in GC pause
> causing my executor to timeout and then getting lost eventually.
>
> Please guide. Thanks in advance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-Spark-shuffle-spill-memory-tp24960.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
### Confidential e-mail, for recipient's (or recipients') eyes only, not
for distribution. ###

Re: How to avoid Spark shuffle spill memory?

Reply via email to