[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

scwf Thu, 26 Feb 2015 18:11:47 -0800

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/3694#issuecomment-76318262
  
    sorry for delay, my initial idea here is 
    1 we can set spark.default.parallsim to control the partitions num for 
shuffle but this config option do not sensitive to data size of rdd, that is 
for one job with 1T input data the partitions num is x but for the same job 
with 1K input data the partitions num is also x. 
    
    2 if we not set spark.default.parallsim, spark rdd use parent rdd's 
partitions num as its partitions num, but in this way i found that there maybe 
some mini-tasks in some case due to the big partitions num of parent rdd, so i 
think maybe we can give a ratio to control the shuffle partition num 
    
    ok, i am closing this




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

Reply via email to