[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

aarondav Wed, 16 Jul 2014 18:52:23 -0700

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1450#discussion_r15038170
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
    @@ -361,11 +361,11 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         // groupByKey shouldn't use map side combine because map side combine 
does not
         // reduce the amount of data shuffled and requires all map side data 
be inserted
         // into a hash table, leading to more objects in the old gen.
    -    def createCombiner(v: V) = ArrayBuffer(v)
    --- End diff --
    
    There appear to be ~6 other functions of this type (defs that may be passed 
into closures), could these also be problematic?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

Reply via email to