Re: Worker Machine running out of disk for Long running Streaming process

gaurav sharma Tue, 15 Sep 2015 04:53:18 -0700

Hi TD,

Sorry for late reply,



I implemented ur suggestion, but unfortunately it didnt help me, i am still
able to see very old schuffle files, because of which ultimately my long
runnning spark job gets terminated


Below is what i did.


//This is the spark-submit job
public class HourlyAggregatorV2 {

    private static Logger logger =
Logger.getLogger(HourlyAggregatorV2.class);

    public static void main(String[] args) throws Exception{

        //Fix for preventing disk full issue in long running jobs, because
of shuffle files not getting cleaned up from disk
        new Thread(new GCThread()).start();

    }
}



public class GCThread implements Runnable{

    @Override
    public void run() {
        boolean isGCedOnce = false;
        while(true){
            if(Calendar.getInstance().get(Calendar.MINUTE)%10 == 0){
                if(!isGCedOnce){
                    System.out.println("Triggered System GC");
                    System.gc();
                    isGCedOnce = true;
                }
            }else {
                isGCedOnce = false;
            }
        }
    }

}


On Sat, Aug 22, 2015 at 9:16 PM, Ashish Rangole <arang...@gmail.com> wrote:

> Interesting. TD, can you please throw some light on why this is and point
> to  the relevant code in Spark repo. It will help in a better understanding
> of things that can affect a long running streaming job.
> On Aug 21, 2015 1:44 PM, "Tathagata Das" <t...@databricks.com> wrote:
>
>> Could you periodically (say every 10 mins) run System.gc() on the driver.
>> The cleaning up shuffles is tied to the garbage collection.
>>
>>
>> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma <sharmagaura...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I have a 24x7 running Streaming Process, which runs on 2 hour windowed
>>> data
>>>
>>> The issue i am facing is my worker machines are running OUT OF DISK space
>>>
>>> I checked that the SHUFFLE FILES are not getting cleaned up.
>>>
>>>
>>> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data
>>>
>>> Ultimately the machines runs out of Disk Spac
>>>
>>>
>>> i read about *spark.cleaner.ttl *config param which what i can
>>> understand from the documentation, says cleans up all the metadata beyond
>>> the time limit.
>>>
>>> I went through https://issues.apache.org/jira/browse/SPARK-5836
>>> it says resolved, but there is no code commit
>>>
>>> Can anyone please throw some light on the issue.
>>>
>>>
>>>
>>

Re: Worker Machine running out of disk for Long running Streaming process

Reply via email to