[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

YuhuWang2002 Thu, 20 Oct 2016 19:43:38 -0700

Github user YuhuWang2002 commented on the issue:

    https://github.com/apache/spark/pull/15297
  
    @tgravescs ï¼
    Thank you for your response, when a single reduce task handling huge data, 
it's slowly and unstable. so we split one reduce task to multi- reduce task.
    A single reduce task doing like A join B. we split to multi-task. task 1 
doing A1 join B,  task 2 dong A2 join B and so on.  A1 is a part of A which 
read from a range of maps output.  For spark sql, it is the A1 as a  separate 
partitions when processing. so it can use mutil-executor to run the task.  for 
dispersion the process pressure.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

Reply via email to