Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/20091
  
    This changes the existing behavior of spark - expectation is for RDD's 
without partitioner to use `spark.default.parallelism` for shuffle; when one or 
more RDD's do have partitioner it was more an implementation detail of which 
gets picked (in some cases 'first', in others 'biggest', etc).
    
    Consider case of cogroup of two filtered rdd's - users set parallelism 
(either explicitly or implicitly in case  of yarn) to handle these cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to