When does spark remove them?
 
Regards,
 
Mihai Iacob
DSX Local - Security, IBM Analytics
 
 
----- Original message -----
From: Vadim Semenov <vadim.seme...@datadoghq.com>
To: Mihai Iacob <mia...@ca.ibm.com>
Cc: user <user@spark.apache.org>
Subject: Re: /tmp fills up to 100GB when using a window function
Date: Tue, Dec 19, 2017 9:46 AM
 
Spark doesn't remove intermediate shuffle files if they're part of the same job.
 
On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote:
This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes.
 
Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files?
 
window = Window.\
              partitionBy('***', 'date_str').\
              orderBy(sqlDf['***'])

sqlDf = sqlDf.withColumn("***",rank().over(window))
df_w_least = sqlDf.filter("***=1")
 
 
 
Regards,
 
Mihai Iacob
DSX Local - Security, IBM Analytics

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org
 

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to