You can also look at the shuffle file cleanup tricks we do inside of the
ALS algorithm in Spark.
On Fri, Feb 23, 2018 at 6:20 PM, vijay.bvp wrote:
> have you looked at
> http://apache-spark-user-list.1001560.n3.nabble.com/Limit-
> Spark-Shuffle-Disk-Usage-td23279.html
>
> and the post mentioned
have you looked at
http://apache-spark-user-list.1001560.n3.nabble.com/Limit-Spark-Shuffle-Disk-Usage-td23279.html
and the post mentioned there
https://forums.databricks.com/questions/277/how-do-i-avoid-the-no-space-left-on-device-error.html
also try compressing the output
https://spark.apache.o
Got it. I understood issue in different way.
On Thu, Feb 22, 2018 at 9:19 PM Keith Chapman
wrote:
> My issue is that there is not enough pressure on GC, hence GC is not
> kicking in fast enough to delete the shuffle files of previous iterations.
>
> Regards,
> Keith.
>
> http://keith-chapman.c
My issue is that there is not enough pressure on GC, hence GC is not
kicking in fast enough to delete the shuffle files of previous iterations.
Regards,
Keith.
http://keith-chapman.com
On Thu, Feb 22, 2018 at 6:58 PM, naresh Goud
wrote:
> It would be very difficult to tell without knowing what
It would be very difficult to tell without knowing what is your application
code doing, what kind of transformation/actions performing. From my
previous experience tuning application code which avoids unnecessary
objects reduce pressure on GC.
On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman
wrote