[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

holdenk Wed, 01 Feb 2017 13:43:38 -0800

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/16739
  
    @felixcheung I was refering to the `   * However, if you're doing a drastic 
coalesce, e.g. to numPartitions = 1,
       * this may result in your computation taking place on fewer nodes than
       * you like (e.g. one node in the case of numPartitions = 1). To avoid 
this,
       * you can pass shuffle = true. This will add a shuffle step, but means 
the
       * current upstream partitions will be executed in parallel (per whatever
       * the current partitioning is).
    ` warning
    
    but documentating the coalesce capping out based on numSlices also sounds 
important to document (and potentially confusing).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

Reply via email to