Re: Worker Machine running out of disk for Long running Streaming process

2015-09-15 Thread gaurav sharma
Hi TD,

Sorry for late reply,


I implemented ur suggestion, but unfortunately it didnt help me, i am still
able to see very old schuffle files, because of which ultimately my long
runnning spark job gets terminated


Below is what i did.


//This is the spark-submit job
public class HourlyAggregatorV2 {

private static Logger logger =
Logger.getLogger(HourlyAggregatorV2.class);

public static void main(String[] args) throws Exception{

//Fix for preventing disk full issue in long running jobs, because
of shuffle files not getting cleaned up from disk
new Thread(new GCThread()).start();

}
}



public class GCThread implements Runnable{

@Override
public void run() {
boolean isGCedOnce = false;
while(true){
if(Calendar.getInstance().get(Calendar.MINUTE)%10 == 0){
if(!isGCedOnce){
System.out.println("Triggered System GC");
System.gc();
isGCedOnce = true;
}
}else {
isGCedOnce = false;
}
}
}

}


On Sat, Aug 22, 2015 at 9:16 PM, Ashish Rangole  wrote:

> Interesting. TD, can you please throw some light on why this is and point
> to  the relevant code in Spark repo. It will help in a better understanding
> of things that can affect a long running streaming job.
> On Aug 21, 2015 1:44 PM, "Tathagata Das"  wrote:
>
>> Could you periodically (say every 10 mins) run System.gc() on the driver.
>> The cleaning up shuffles is tied to the garbage collection.
>>
>>
>> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma 
>> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I have a 24x7 running Streaming Process, which runs on 2 hour windowed
>>> data
>>>
>>> The issue i am facing is my worker machines are running OUT OF DISK space
>>>
>>> I checked that the SHUFFLE FILES are not getting cleaned up.
>>>
>>>
>>> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data
>>>
>>> Ultimately the machines runs out of Disk Spac
>>>
>>>
>>> i read about *spark.cleaner.ttl *config param which what i can
>>> understand from the documentation, says cleans up all the metadata beyond
>>> the time limit.
>>>
>>> I went through https://issues.apache.org/jira/browse/SPARK-5836
>>> it says resolved, but there is no code commit
>>>
>>> Can anyone please throw some light on the issue.
>>>
>>>
>>>
>>


Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point
to  the relevant code in Spark repo. It will help in a better understanding
of things that can affect a long running streaming job.
On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote:

 Could you periodically (say every 10 mins) run System.gc() on the driver.
 The cleaning up shuffles is tied to the garbage collection.


 On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com
 wrote:

 Hi All,


 I have a 24x7 running Streaming Process, which runs on 2 hour windowed
 data

 The issue i am facing is my worker machines are running OUT OF DISK space

 I checked that the SHUFFLE FILES are not getting cleaned up.


 /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data

 Ultimately the machines runs out of Disk Spac


 i read about *spark.cleaner.ttl *config param which what i can
 understand from the documentation, says cleans up all the metadata beyond
 the time limit.

 I went through https://issues.apache.org/jira/browse/SPARK-5836
 it says resolved, but there is no code commit

 Can anyone please throw some light on the issue.






Re: Worker Machine running out of disk for Long running Streaming process

2015-08-21 Thread Tathagata Das
Could you periodically (say every 10 mins) run System.gc() on the driver.
The cleaning up shuffles is tied to the garbage collection.


On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma sharmagaura...@gmail.com
wrote:

 Hi All,


 I have a 24x7 running Streaming Process, which runs on 2 hour windowed data

 The issue i am facing is my worker machines are running OUT OF DISK space

 I checked that the SHUFFLE FILES are not getting cleaned up.


 /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data

 Ultimately the machines runs out of Disk Spac


 i read about *spark.cleaner.ttl *config param which what i can understand
 from the documentation, says cleans up all the metadata beyond the time
 limit.

 I went through https://issues.apache.org/jira/browse/SPARK-5836
 it says resolved, but there is no code commit

 Can anyone please throw some light on the issue.





Worker Machine running out of disk for Long running Streaming process

2015-08-21 Thread gaurav sharma
Hi All,


I have a 24x7 running Streaming Process, which runs on 2 hour windowed data

The issue i am facing is my worker machines are running OUT OF DISK space

I checked that the SHUFFLE FILES are not getting cleaned up.

/log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data

Ultimately the machines runs out of Disk Spac


i read about *spark.cleaner.ttl *config param which what i can understand
from the documentation, says cleans up all the metadata beyond the time
limit.

I went through https://issues.apache.org/jira/browse/SPARK-5836
it says resolved, but there is no code commit

Can anyone please throw some light on the issue.