Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-23 Thread Holden Karau
You can also look at the shuffle file cleanup tricks we do inside of the ALS algorithm in Spark. On Fri, Feb 23, 2018 at 6:20 PM, vijay.bvp wrote: > have you looked at > http://apache-spark-user-list.1001560.n3.nabble.com/Limit- > Spark-Shuffle-Disk-Usage-td23279.html > > and the post mentioned

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-23 Thread vijay.bvp
have you looked at http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-td23279.html and the post mentioned there https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html also try compressing the output https://spark.apache.o

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud
Got it. I understood issue in different way. On Thu, Feb 22, 2018 at 9:19 PM Keith Chapman wrote: > My issue is that there is not enough pressure on GC, hence GC is not > kicking in fast enough to delete the shuffle files of previous iterations. > > Regards, > Keith. > > http://keith-chapman.c

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread Keith Chapman
My issue is that there is not enough pressure on GC, hence GC is not kicking in fast enough to delete the shuffle files of previous iterations. Regards, Keith. http://keith-chapman.com On Thu, Feb 22, 2018 at 6:58 PM, naresh Goud wrote: > It would be very difficult to tell without knowing what

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud
It would be very difficult to tell without knowing what is your application code doing, what kind of transformation/actions performing. From my previous experience tuning application code which avoids unnecessary objects reduce pressure on GC. On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman wrote