[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

rxin Wed, 16 Jul 2014 18:59:42 -0700

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1450#discussion_r15038311
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
    @@ -361,11 +361,11 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // groupByKey shouldn't use map side combine because map side combine 
does not
         // reduce the amount of data shuffled and requires all map side data 
be inserted
         // into a hash table, leading to more objects in the old gen.
    -    def createCombiner(v: V) = ArrayBuffer(v)
    --- End diff --
    
    We should change all of them actually. I will update the PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

Reply via email to