Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21927#discussion_r207108898
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1946,4 +1990,11 @@ private[spark] object DAGScheduler {
     
       // Number of consecutive stage attempts allowed before a stage is aborted
       val DEFAULT_MAX_CONSECUTIVE_STAGE_ATTEMPTS = 4
    +
    +  // Error message when running a barrier stage that have unsupported RDD 
chain pattern.
    +  val ERROR_MESSAGE_RUN_BARRIER_WITH_UNSUPPORTED_RDD_CHAIN_PATTERN =
    +    "[SPARK-24820][SPARK-24821]: Barrier execution mode does not allow the 
following pattern of " +
    +      "RDD chain within a barrier stage:\n1. Ancestor RDDs that have 
different number of " +
    +      "partitions from the resulting RDD (eg. 
union()/coalesce()/first()/PartitionPruningRDD);\n" +
    --- End diff --
    
    Please also list `take()`. It would be nice to provide a workaround for 
`first()` and `take()`: `barrierRdd.collect().head (scala), 
barrierRdd.collect()[0] (python)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to