Hi TD, Sorry for late reply,
I implemented ur suggestion, but unfortunately it didnt help me, i am still able to see very old schuffle files, because of which ultimately my long runnning spark job gets terminated Below is what i did. //This is the spark-submit job public class HourlyAggregatorV2 { private static Logger logger = Logger.getLogger(HourlyAggregatorV2.class); public static void main(String[] args) throws Exception{ //Fix for preventing disk full issue in long running jobs, because of shuffle files not getting cleaned up from disk new Thread(new GCThread()).start(); } } public class GCThread implements Runnable{ @Override public void run() { boolean isGCedOnce = false; while(true){ if(Calendar.getInstance().get(Calendar.MINUTE)%10 == 0){ if(!isGCedOnce){ System.out.println("Triggered System GC"); System.gc(); isGCedOnce = true; } }else { isGCedOnce = false; } } } } On Sat, Aug 22, 2015 at 9:16 PM, Ashish Rangole <arang...@gmail.com> wrote: > Interesting. TD, can you please throw some light on why this is and point > to the relevant code in Spark repo. It will help in a better understanding > of things that can affect a long running streaming job. > On Aug 21, 2015 1:44 PM, "Tathagata Das" <t...@databricks.com> wrote: > >> Could you periodically (say every 10 mins) run System.gc() on the driver. >> The cleaning up shuffles is tied to the garbage collection. >> >> >> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma <sharmagaura...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> >>> I have a 24x7 running Streaming Process, which runs on 2 hour windowed >>> data >>> >>> The issue i am facing is my worker machines are running OUT OF DISK space >>> >>> I checked that the SHUFFLE FILES are not getting cleaned up. >>> >>> >>> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data >>> >>> Ultimately the machines runs out of Disk Spac >>> >>> >>> i read about *spark.cleaner.ttl *config param which what i can >>> understand from the documentation, says cleans up all the metadata beyond >>> the time limit. >>> >>> I went through https://issues.apache.org/jira/browse/SPARK-5836 >>> it says resolved, but there is no code commit >>> >>> Can anyone please throw some light on the issue. >>> >>> >>> >>