Hi TD,
Sorry for late reply,
I implemented ur suggestion, but unfortunately it didnt help me, i am still
able to see very old schuffle files, because of which ultimately my long
runnning spark job gets terminated
Below is what i did.
//This is the spark-submit job
public class HourlyAggregat
Interesting. TD, can you please throw some light on why this is and point
to the relevant code in Spark repo. It will help in a better understanding
of things that can affect a long running streaming job.
On Aug 21, 2015 1:44 PM, "Tathagata Das" wrote:
> Could you periodically (say every 10 mins
Could you periodically (say every 10 mins) run System.gc() on the driver.
The cleaning up shuffles is tied to the garbage collection.
On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma
wrote:
> Hi All,
>
>
> I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
>
> The issue i
Hi All,
I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
The issue i am facing is my worker machines are running OUT OF DISK space
I checked that the SHUFFLE FILES are not getting cleaned up.
/log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438