[jira] [Comment Edited] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-14 Thread Luis Ramos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575005#comment-15575005
 ] 

Luis Ramos edited comment on SPARK-636 at 10/14/16 11:11 AM:
-

I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because my 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have some 'init' code that is guaranteed to only be 
evaluated once as opposed to implementing that 'guarantee' yourself, which can 
currently lead to bad practices.

Edit: For some context, I'm approaching this issue from SPARK-650


was (Author: luisramos):
I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because my 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have some 'init' code that is guaranteed to only be 
evaluated once as opposed to implementing that 'guarantee' yourself, which can 
currently lead to bad practices.

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-14 Thread Luis Ramos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575005#comment-15575005
 ] 

Luis Ramos edited comment on SPARK-636 at 10/14/16 11:07 AM:
-

I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because my 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have some 'init' code that is guaranteed to only be 
evaluated once as opposed to implementing that 'guarantee' yourself, which can 
currently lead to bad practices.


was (Author: luisramos):
I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have init code that is 'guaranteed' to only be executed once 
as opposed to implement that 'guarantee' yourself, which currently can lead to 
bad practices.

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org