[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-16 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579878#comment-15579878
 ] 

holdenk commented on SPARK-7941:


So if its ok - since I don't see other reports of this - unless this is an 
issue someone (including [~cqnguyen]) is still experience I'll go ahead and 
soft-close this at the end of next weak.

> Cache Cleanup Failure when job is killed by Spark 
> --
>
> Key: SPARK-7941
> URL: https://issues.apache.org/jira/browse/SPARK-7941
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 1.3.1
>Reporter: Cory Nguyen
> Attachments: screenshot-1.png
>
>
> Problem/Bug:
> If a job is running and Spark kills the job intentionally, the cache files 
> remains on the local/worker nodes and are not cleaned up properly. Over time 
> the old cache builds up and causes "No Space Left on Device" error. 
> The cache is cleaned up properly when the job succeeds. I have not verified 
> if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2016-10-07 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556775#comment-15556775
 ] 

holdenk commented on SPARK-7941:


Are you still experiencing this issue [~cqnguyen] or would it be ok for us to 
close this?

> Cache Cleanup Failure when job is killed by Spark 
> --
>
> Key: SPARK-7941
> URL: https://issues.apache.org/jira/browse/SPARK-7941
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 1.3.1
>Reporter: Cory Nguyen
> Attachments: screenshot-1.png
>
>
> Problem/Bug:
> If a job is running and Spark kills the job intentionally, the cache files 
> remains on the local/worker nodes and are not cleaned up properly. Over time 
> the old cache builds up and causes "No Space Left on Device" error. 
> The cache is cleaned up properly when the job succeeds. I have not verified 
> if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2015-05-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564366#comment-14564366
 ] 

Sean Owen commented on SPARK-7941:
--

Which cache files are you referring to (what location?). YARN containers are 
cleaned up by YARN.

 Cache Cleanup Failure when job is killed by Spark 
 --

 Key: SPARK-7941
 URL: https://issues.apache.org/jira/browse/SPARK-7941
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.3.1
Reporter: Cory Nguyen

 Problem/Bug:
 If a job is running and Spark kills the job intentionally, the cache files 
 remains on the local/worker nodes and are not cleaned up properly. Over time 
 the old cache builds up and causes No Space Left on Device error. 
 The cache is cleaned up properly when the job succeeds. I have not verified 
 if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2015-05-29 Thread Cory Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564376#comment-14564376
 ] 

Cory Nguyen commented on SPARK-7941:


The location in /mnt or /mnt1 = 
/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache

 Cache Cleanup Failure when job is killed by Spark 
 --

 Key: SPARK-7941
 URL: https://issues.apache.org/jira/browse/SPARK-7941
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.3.1
Reporter: Cory Nguyen

 Problem/Bug:
 If a job is running and Spark kills the job intentionally, the cache files 
 remains on the local/worker nodes and are not cleaned up properly. Over time 
 the old cache builds up and causes No Space Left on Device error. 
 The cache is cleaned up properly when the job succeeds. I have not verified 
 if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2015-05-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564396#comment-14564396
 ] 

Sean Owen commented on SPARK-7941:
--

I am not an expert on this bit of YARN, but, this looks like app data from the 
hadoop user, rather than you or yarn or a spark user. Is it Spark-related? Are 
these containers still running actually? Container data may stick around for a 
retry too in some cases. I also thought YARN would eventually clean this up 
regardless of what happened in the container. 

 Cache Cleanup Failure when job is killed by Spark 
 --

 Key: SPARK-7941
 URL: https://issues.apache.org/jira/browse/SPARK-7941
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.3.1
Reporter: Cory Nguyen
 Attachments: screenshot-1.png


 Problem/Bug:
 If a job is running and Spark kills the job intentionally, the cache files 
 remains on the local/worker nodes and are not cleaned up properly. Over time 
 the old cache builds up and causes No Space Left on Device error. 
 The cache is cleaned up properly when the job succeeds. I have not verified 
 if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2015-05-29 Thread Cory Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564459#comment-14564459
 ] 

Cory Nguyen commented on SPARK-7941:


I'm not entirely sure what you meant by app data from the hadoop user, rather 
than you or yarn or spark user. hadoop is the standard user when Spark is 
deployed on AWS EMR. The hadoop user submits the spark jobs to yarn - I think 
that is why you may be confused by what you saw. However that appcache folder 
is spark related because only spark is ran on this cluster, I monitored the 
individual node as the job was running and was about to see the growing of the 
appcache due to the job being ran.

Yes, this is spark related. No, the containers are not still running. I know 
for certain the cache data is related to the spark job running. I thought YARN 
would clean this up too, but that was not the case, the data was still there 
hours later after the job was killed by spark/yarn.


 Cache Cleanup Failure when job is killed by Spark 
 --

 Key: SPARK-7941
 URL: https://issues.apache.org/jira/browse/SPARK-7941
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.3.1
Reporter: Cory Nguyen
 Attachments: screenshot-1.png


 Problem/Bug:
 If a job is running and Spark kills the job intentionally, the cache files 
 remains on the local/worker nodes and are not cleaned up properly. Over time 
 the old cache builds up and causes No Space Left on Device error. 
 The cache is cleaned up properly when the job succeeds. I have not verified 
 if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

2015-05-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564462#comment-14564462
 ] 

Sean Owen commented on SPARK-7941:
--

I see, scratch that. 'hadoop' is usually a user for MapReduce, but I'm used to 
a 'spark' user for Spark jobs. EMR is different. 
Try with the latest master. Several cleanup items have been improved since 1.3 
but I don't know if they're relevant. Last time I looked all temp dirs appear 
correctly deleted on shutdown.

 Cache Cleanup Failure when job is killed by Spark 
 --

 Key: SPARK-7941
 URL: https://issues.apache.org/jira/browse/SPARK-7941
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.3.1
Reporter: Cory Nguyen
 Attachments: screenshot-1.png


 Problem/Bug:
 If a job is running and Spark kills the job intentionally, the cache files 
 remains on the local/worker nodes and are not cleaned up properly. Over time 
 the old cache builds up and causes No Space Left on Device error. 
 The cache is cleaned up properly when the job succeeds. I have not verified 
 if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org