Re: How to clear spark Shuffle files

2020-09-14 Thread Edward Mitchell
We've also had some similar disk fill issues. For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM garbage collection. I've noticed that if RDDs maintain references in the code, and cannot be garbage collected, then immediate shuffle files hang around. Best way to handle this is

Re: Async RDD saves

2020-08-07 Thread Edward Mitchell
I will agree that the side effects of using Futures in driver code tend to be tricky to track down. If you forget to clear the job description and job group information, when the LocalProperties on the SparkContext remain intact - SparkContext#submitJob makes sure to pass down the

Re: Spark 3 pod template for the driver

2020-07-06 Thread Edward Mitchell
Except if there has been reasons not doing it like that from the beginning? > > thanks, > Michel > > Le jeudi 2 juillet 2020 à 00:43:25 UTC+1, Edward Mitchell < > edee...@gmail.com> a écrit : > > > Okay, I see what's going on here. > > Looks like the w

Re: Spark 3 pod template for the driver

2020-07-01 Thread Edward Mitchell
Okay, I see what's going on here. Looks like the way that spark is coded, the driver container image (specified by --conf spark.kubernetes.driver.container.image) and executor container image (specified by --conf spark.kubernetes.executor.container.image) is required. If they're not specified