[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-11-25 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695967#comment-15695967
 ] 

holdenk commented on SPARK-636:
---

If you have a logging system you want to initialize wouldn't using an object 
with lazy initialization on call be sufficient?

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578294#comment-15578294
 ] 

Michael Schmeißer commented on SPARK-636:
-

I agree, that's why I also feel that these issues are no duplicates. 

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-14 Thread Luis Ramos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575005#comment-15575005
 ] 

Luis Ramos commented on SPARK-636:
--

I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have init code that is 'guaranteed' to only be executed once 
as opposed to implement that 'guarantee' yourself, which currently can lead to 
bad practices.

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers

2016-10-07 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556322#comment-15556322
 ] 

holdenk commented on SPARK-636:
---

Does broadcasting get us close enough to handling this or is this something we 
are still considering for the API?

> Add mechanism to run system management/configuration tasks on all workers
> -
>
> Key: SPARK-636
> URL: https://issues.apache.org/jira/browse/SPARK-636
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Josh Rosen
>
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org