[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-05 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159707#comment-14159707
 ] 

Andrew Ash commented on SPARK-1860:
---

Filed as SPARK-3805

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-04 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159343#comment-14159343
 ] 

Aaron Davidson commented on SPARK-1860:
---

Agreed, that sounds good. Would you or [~mccheah] be able to create a quick PR 
for this?

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-03 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158646#comment-14158646
 ] 

Andrew Ash commented on SPARK-1860:
---

[~ilikerps] this ticket mentioned turning the cleanup code on by default once 
this ticket was fixed.  Should we change the defaults to have this on by 
default?

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-01 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155097#comment-14155097
 ] 

Matt Cheah commented on SPARK-1860:
---

This might be a silly question, but are we guaranteed that the application 
folder will always be labeled by appid? I looked at ExecutorRunner and it 
certainly generates the folder by application ID and executor ID, but code 
comments in ExecutorRunner indicate it is only used by the standalone cluster 
mode. Hence I didn't tie any logic to the actual naming of the folders.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-10-01 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155303#comment-14155303
 ] 

Aaron Davidson commented on SPARK-1860:
---

The Worker itself is solely a Standalone mode construct, so AFAIK, this is not 
an issue.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152839#comment-14152839
 ] 

Aaron Davidson commented on SPARK-1860:
---

The Executor could clean up its own jars when it terminates normally, that 
seems fine. The impact of this seems limited, though, and it's a good idea to 
limit the scope of shutdown hooks as much as possible.

There are three classes of things to delete:
1. Shuffle files / block manager blocks -- large -- deleted by graceful 
Executor termination. Can be deleted immediately.
2. Uploaded jars / files -- usually small -- deleted by Worker cleanup. Can be 
deleted immediately.
3. Logs -- small to medium -- deleted by Worker cleanup. Should not be deleted 
immediately.

Number 1 is most critical in terms of impact on the system. Numbers 2 and 3 are 
of the same order of magnitude in size, so cleaning up 2 and not 3 is not 
expected to improve the system's stability by more than a factor of ~2x 
applications.

Note that the intentions of this particular JIRA are very simple: cleanup 2 and 
3 for all executors several days after they have terminated, rather than after 
they have started. If you wish to expand the scope of the Worker or Executor 
cleanup, that should be covered in a separate JIRA (which is welcome -- I just 
want to make sure we're on the same page about this particular issue!).

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153382#comment-14153382
 ] 

Matt Cheah commented on SPARK-1860:
---

Cool, I see where you're coming from now. I'll whip up something. Thanks for 
the input!

The cleanup is more for cosmetic sake than performance, I agree.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153514#comment-14153514
 ] 

Matt Cheah commented on SPARK-1860:
---

The change I am going to make is that when the cleanup task runs, it deletes an 
app directory inside the work directory if both the timestamp on the app 
directory and the timestamps on all of the app directory's files are older than 
the app directory retention time.

If any files inside the app directory are recently modified, the app directory 
is not touched.

Let me know if this change suffices to address this issue and I'll open a pull 
request.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153752#comment-14153752
 ] 

Andrew Ash commented on SPARK-1860:
---

That matches my expectations for this ticket Matt -- improve the timed cleanup 
task to only delete applications that have terminated instead of running ones 
as well.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154220#comment-14154220
 ] 

Apache Spark commented on SPARK-1860:
-

User 'mccheah' has created a pull request for this issue:
https://github.com/apache/spark/pull/2608

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154226#comment-14154226
 ] 

Apache Spark commented on SPARK-1860:
-

User 'mccheah' has created a pull request for this issue:
https://github.com/apache/spark/pull/2609

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-30 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154228#comment-14154228
 ] 

Aaron Davidson commented on SPARK-1860:
---

Your logic SGTM, but I would add one additional check to avoid deleting the 
directory for an Application which still has running Executors on that node, 
just to make absolutely sure that we don't delete app directories that just 
happen to sit idle for a while. This check can be performed by iterating over 
the executors map in Worker.scala and matching the appId with the app 
directory's name.


 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-29 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152421#comment-14152421
 ] 

Matt Cheah commented on SPARK-1860:
---

Apologies for any naivety - this will be the first issue I tackle as a Spark 
contributor.

Mingyu and I had a short chat and we thought it would be reasonable for the 
Executor to simply clean up its own state when it shuts down. Is there anything 
preventing Executor.stop() from cleaning up the app directory it was using?

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-29 Thread Andrew Ash (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152552#comment-14152552
 ] 

Andrew Ash commented on SPARK-1860:
---

Cleanup on executor shutdown is part of the solution (and should be done IMO) 
but not all of it.

Particularly it won't cover when an executor dies from an OOM or a kill -9 or 
any other unclean shutdown.  The perfect solution would do the event-based 
cleanup self on executor shutdown, and also a periodic cleaner to get rid of 
directories that were shutdown uncleanly.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-29 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152621#comment-14152621
 ] 

Matt Cheah commented on SPARK-1860:
---

ExecutorRunner seems to have various cases corresponding to how the Executor 
exited. ExecutorRunner also creates the directory in fetchAndRunExecutor(). We 
can catch all of the exit cases there and delete the directory in any case.

In the case that the executor failed to exit, however, it would be best to 
preserve the logs. instead of blindly killing the whole directory.

On that note, one other thought is that perhaps we actually want to preserve 
the directory entirely upon crash since preserving the state will allow us to 
better understand what happened, i.e. what jars and files were present and so 
on.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-29 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152744#comment-14152744
 ] 

Aaron Davidson commented on SPARK-1860:
---

Note that there are two separate forms of cleanup: application data cleanup 
(jars and logs) and shuffle data cleanup. Standalone Worker cleanup deals with 
the former, Executor termination handlers deal with the latter. The purpose is 
not to deal with executors that have terminated ungracefully, but to actually 
clean up old application directories.

Here the idea is that a Worker may be running for a very long time (weeks, 
months) and over time accumulates hundreds of application directories. We want 
to delete these directories after several days of them being terminated (today 
we'll clean them up whether or not they're terminated, which loses their jars 
and logs), after which we presumably don't care anymore. We do not want to 
clean them up immediately after application termination.

The Worker performing shuffle data cleanup for ungracefully terminated 
Executors is not a bad idea, but is a (smallish) feature onto itself, as the 
Worker does not currently know where a particular Executor is storing its data.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-29 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152758#comment-14152758
 ] 

Matt Cheah commented on SPARK-1860:
---

I agree we should focus the scope on cleaning up things that have successfully 
finished.

However, should it not be the case that when an Executor shuts down, it cleans 
up all of the files it created? As you stated, the Worker doesn't know where a 
particular Executor is storing its data, but the Executor should know where it 
is storing its own data, and be managing it and cleaning up when completed. 
This is regardless of the distinction between application data and shuffle data.

The Executor class has a record of the files and jars added through the 
SparkContext (currentFiles and currentJars fields) for that Executor's use, and 
these should naturally expire and be cleaned up when the Executor terminates.

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Blocker

 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-07-28 Thread Mark Hamstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076778#comment-14076778
 ] 

Mark Hamstra commented on SPARK-1860:
-

I don't think that there is much in the way of conflict, but something to be 
aware of is that the proposed fix to SPARK-2425 does modify Executor state 
transitions and cleanup: https://github.com/apache/spark/pull/1360

 Standalone Worker cleanup should not clean up running executors
 ---

 Key: SPARK-1860
 URL: https://issues.apache.org/jira/browse/SPARK-1860
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Aaron Davidson
Priority: Critical
 Fix For: 1.1.0


 The default values of the standalone worker cleanup code cleanup all 
 application data every 7 days. This includes jars that were added to any 
 executors that happen to be running for longer than 7 days, hitting streaming 
 jobs especially hard.
 Executor's log/data folders should not be cleaned up if they're still 
 running. Until then, this behavior should not be enabled by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)