[jira] [Comment Edited] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers
[ https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575005#comment-15575005 ] Luis Ramos edited comment on SPARK-636 at 10/14/16 11:11 AM: - I feel like the broadcasting mechanism doesn't get me "close" enough to solve my issue (initialization of a logging system). That's partly because my initialization would be deferred (meaning a loss of useful logs), and also it could enable us to have some 'init' code that is guaranteed to only be evaluated once as opposed to implementing that 'guarantee' yourself, which can currently lead to bad practices. Edit: For some context, I'm approaching this issue from SPARK-650 was (Author: luisramos): I feel like the broadcasting mechanism doesn't get me "close" enough to solve my issue (initialization of a logging system). That's partly because my initialization would be deferred (meaning a loss of useful logs), and also it could enable us to have some 'init' code that is guaranteed to only be evaluated once as opposed to implementing that 'guarantee' yourself, which can currently lead to bad practices. > Add mechanism to run system management/configuration tasks on all workers > - > > Key: SPARK-636 > URL: https://issues.apache.org/jira/browse/SPARK-636 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Josh Rosen > > It would be useful to have a mechanism to run a task on all workers in order > to perform system management tasks, such as purging caches or changing system > properties. This is useful for automated experiments and benchmarking; I > don't envision this being used for heavy computation. > Right now, I can mimic this with something like > {code} > sc.parallelize(0 until numMachines, numMachines).foreach { } > {code} > but this does not guarantee that every worker runs a task and requires my > user code to know the number of workers. > One sample use case is setup and teardown for benchmark tests. For example, > I might want to drop cached RDDs, purge shuffle data, and call > {{System.gc()}} between test runs. It makes sense to incorporate some of > this functionality, such as dropping cached RDDs, into Spark itself, but it > might be helpful to have a general mechanism for running ad-hoc tasks like > {{System.gc()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-636) Add mechanism to run system management/configuration tasks on all workers
[ https://issues.apache.org/jira/browse/SPARK-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575005#comment-15575005 ] Luis Ramos edited comment on SPARK-636 at 10/14/16 11:07 AM: - I feel like the broadcasting mechanism doesn't get me "close" enough to solve my issue (initialization of a logging system). That's partly because my initialization would be deferred (meaning a loss of useful logs), and also it could enable us to have some 'init' code that is guaranteed to only be evaluated once as opposed to implementing that 'guarantee' yourself, which can currently lead to bad practices. was (Author: luisramos): I feel like the broadcasting mechanism doesn't get me "close" enough to solve my issue (initialization of a logging system). That's partly because initialization would be deferred (meaning a loss of useful logs), and also it could enable us to have init code that is 'guaranteed' to only be executed once as opposed to implement that 'guarantee' yourself, which currently can lead to bad practices. > Add mechanism to run system management/configuration tasks on all workers > - > > Key: SPARK-636 > URL: https://issues.apache.org/jira/browse/SPARK-636 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Reporter: Josh Rosen > > It would be useful to have a mechanism to run a task on all workers in order > to perform system management tasks, such as purging caches or changing system > properties. This is useful for automated experiments and benchmarking; I > don't envision this being used for heavy computation. > Right now, I can mimic this with something like > {code} > sc.parallelize(0 until numMachines, numMachines).foreach { } > {code} > but this does not guarantee that every worker runs a task and requires my > user code to know the number of workers. > One sample use case is setup and teardown for benchmark tests. For example, > I might want to drop cached RDDs, purge shuffle data, and call > {{System.gc()}} between test runs. It makes sense to incorporate some of > this functionality, such as dropping cached RDDs, into Spark itself, but it > might be helpful to have a general mechanism for running ad-hoc tasks like > {{System.gc()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org