When does spark remove them?
Regards,
Mihai Iacob DSX Local - Security, IBM Analytics |
----- Original message -----
From: Vadim Semenov <vadim.seme...@datadoghq.com>
To: Mihai Iacob <mia...@ca.ibm.com>
Cc: user <user@spark.apache.org>
Subject: Re: /tmp fills up to 100GB when using a window function
Date: Tue, Dec 19, 2017 9:46 AM
Spark doesn't remove intermediate shuffle files if they're part of the same job.On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote:This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes.Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files?window = Window.\
partitionBy('***', 'date_str').\
orderBy(sqlDf['***'])
sqlDf = sqlDf.withColumn("***",rank().over(window))
df_w_least = sqlDf.filter("***=1")
Regards,Mihai Iacob
DSX Local - Security, IBM Analytics
------------------------------------------------------------ --------- To unsubscribe e-mail: user-unsubscribe@spark.apache. org
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org