[jira] [Commented] (SPARK-24817) Implement BarrierTaskContext.barrier()

2018-08-02 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567358#comment-16567358
 ] 

Erik Erlandson commented on SPARK-24817:


I have been looking at the use cases for barrier-mode on the design doc. The 
primary story seems to be along the lines of using {{mapPartitions}} to:
 # write out any partitioned data (and sync)
 # execute some kind of ML logic (TF, etc) (possibly syncing on stages here?)
 # optionally move back into "normal" spark executions

My mental model has been that the value proposition for Hydrogen is primarily a 
convergence argument: it is easier to not have to leave a Spark workflow and 
execute something like TF using some other toolchain. But OTOH, given that the 
Spark programmer has to write out the partitioned data and then invoke ML 
tooling like TF regardless, does the increase to convenience pay for the cost 
in complexity for absorbing new clustering & scheduling models into Spark, 
along with other consequences, for example SPARK-24615, compared to the "null 
hypothesis" of writing partition data, then using ML-specific clustering 
toolchains (kubeflow, for example), and consuming the resulting products in 
Spark afterward.

> Implement BarrierTaskContext.barrier()
> --
>
> Key: SPARK-24817
> URL: https://issues.apache.org/jira/browse/SPARK-24817
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24817) Implement BarrierTaskContext.barrier()

2018-08-02 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567301#comment-16567301
 ] 

Erik Erlandson commented on SPARK-24817:


Thanks [~jiangxb] - I'd expect that design to work out-of-box on the k8s 
backend. 

ML-specific code seems like it will have needs that are harder to predict, by 
definition. If it can use IP addresses in the cluster space, it should work 
regardless. If it wants fqdn, then perhaps additional pod configurations will 
be required.

> Implement BarrierTaskContext.barrier()
> --
>
> Key: SPARK-24817
> URL: https://issues.apache.org/jira/browse/SPARK-24817
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24817) Implement BarrierTaskContext.barrier()

2018-08-01 Thread Jiang Xingbo (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566278#comment-16566278
 ] 

Jiang Xingbo commented on SPARK-24817:
--

Actually the current implementation of _barrier_ function doesn't requires 
communications between executors, all executors just talk to a 
_BarrierCoordinator_ which is in the driver. But to allow launching ML 
workloads we do need to enable executors to communicate with each other 
directly, IIUC that shall be investigated under SPARK-24724 . Maybe [~mengxr] 
can provide more context here.

> Implement BarrierTaskContext.barrier()
> --
>
> Key: SPARK-24817
> URL: https://issues.apache.org/jira/browse/SPARK-24817
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24817) Implement BarrierTaskContext.barrier()

2018-08-01 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566159#comment-16566159
 ] 

Erik Erlandson commented on SPARK-24817:


I'm curious about what the {{barrier}} invocations inside {{mapPartitions}} 
closures imply about communications between executors, for example executors 
running on pods in a kube cluster. It is possible that whatever is allowing 
shuffle data to transfer between executors will also allow these  {{barrier}} 
coordinations to work, but we had to create a headless service for executors to 
register properly with the driver pod, and if every executor pod needs 
something like that for barrier to work, it will be an impact for kube backend 
support.

> Implement BarrierTaskContext.barrier()
> --
>
> Key: SPARK-24817
> URL: https://issues.apache.org/jira/browse/SPARK-24817
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24817) Implement BarrierTaskContext.barrier()

2018-07-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560096#comment-16560096
 ] 

Apache Spark commented on SPARK-24817:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/21898

> Implement BarrierTaskContext.barrier()
> --
>
> Key: SPARK-24817
> URL: https://issues.apache.org/jira/browse/SPARK-24817
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Jiang Xingbo
>Priority: Major
>
> Implement BarrierTaskContext.barrier(), to support global sync between all 
> the tasks in a barrier stage. The global sync shall finish immediately once 
> all tasks in the same barrier stage reaches the same barrier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org