Re: Worker Machine running out of disk for Long running Streaming process
Hi TD, Sorry for late reply, I implemented ur suggestion, but unfortunately it didnt help me, i am still able to see very old schuffle files, because of which ultimately my long runnning spark job gets terminated Below is what i did. //This is the spark-submit job public class HourlyAggregatorV2 { private static Logger logger = Logger.getLogger(HourlyAggregatorV2.class); public static void main(String[] args) throws Exception{ //Fix for preventing disk full issue in long running jobs, because of shuffle files not getting cleaned up from disk new Thread(new GCThread()).start(); } } public class GCThread implements Runnable{ @Override public void run() { boolean isGCedOnce = false; while(true){ if(Calendar.getInstance().get(Calendar.MINUTE)%10 == 0){ if(!isGCedOnce){ System.out.println("Triggered System GC"); System.gc(); isGCedOnce = true; } }else { isGCedOnce = false; } } } } On Sat, Aug 22, 2015 at 9:16 PM, Ashish Rangolewrote: > Interesting. TD, can you please throw some light on why this is and point > to the relevant code in Spark repo. It will help in a better understanding > of things that can affect a long running streaming job. > On Aug 21, 2015 1:44 PM, "Tathagata Das" wrote: > >> Could you periodically (say every 10 mins) run System.gc() on the driver. >> The cleaning up shuffles is tied to the garbage collection. >> >> >> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma >> wrote: >> >>> Hi All, >>> >>> >>> I have a 24x7 running Streaming Process, which runs on 2 hour windowed >>> data >>> >>> The issue i am facing is my worker machines are running OUT OF DISK space >>> >>> I checked that the SHUFFLE FILES are not getting cleaned up. >>> >>> >>> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data >>> >>> Ultimately the machines runs out of Disk Spac >>> >>> >>> i read about *spark.cleaner.ttl *config param which what i can >>> understand from the documentation, says cleans up all the metadata beyond >>> the time limit. >>> >>> I went through https://issues.apache.org/jira/browse/SPARK-5836 >>> it says resolved, but there is no code commit >>> >>> Can anyone please throw some light on the issue. >>> >>> >>> >>
Re: Worker Machine running out of disk for Long running Streaming process
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote: Could you periodically (say every 10 mins) run System.gc() on the driver. The cleaning up shuffles is tied to the garbage collection. On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com wrote: Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed data The issue i am facing is my worker machines are running OUT OF DISK space I checked that the SHUFFLE FILES are not getting cleaned up. /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data Ultimately the machines runs out of Disk Spac i read about *spark.cleaner.ttl *config param which what i can understand from the documentation, says cleans up all the metadata beyond the time limit. I went through https://issues.apache.org/jira/browse/SPARK-5836 it says resolved, but there is no code commit Can anyone please throw some light on the issue.
Re: Worker Machine running out of disk for Long running Streaming process
Could you periodically (say every 10 mins) run System.gc() on the driver. The cleaning up shuffles is tied to the garbage collection. On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com wrote: Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed data The issue i am facing is my worker machines are running OUT OF DISK space I checked that the SHUFFLE FILES are not getting cleaned up. /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data Ultimately the machines runs out of Disk Spac i read about *spark.cleaner.ttl *config param which what i can understand from the documentation, says cleans up all the metadata beyond the time limit. I went through https://issues.apache.org/jira/browse/SPARK-5836 it says resolved, but there is no code commit Can anyone please throw some light on the issue.
Worker Machine running out of disk for Long running Streaming process
Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed data The issue i am facing is my worker machines are running OUT OF DISK space I checked that the SHUFFLE FILES are not getting cleaned up. /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data Ultimately the machines runs out of Disk Spac i read about *spark.cleaner.ttl *config param which what i can understand from the documentation, says cleans up all the metadata beyond the time limit. I went through https://issues.apache.org/jira/browse/SPARK-5836 it says resolved, but there is no code commit Can anyone please throw some light on the issue.