Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/4634#discussion_r26056992
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala
---
@@ -233,18 +235,44 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
def combineByKey[C](createCombiner: JFunction[V, C],
mergeValue: JFunction2[C, V, C],
mergeCombiners: JFunction2[C, C, C],
- partitioner: Partitioner): JavaPairRDD[K, C] = {
+ partitioner: Partitioner,
+ mapSideCombine: Boolean,
+ serializer: Serializer): JavaPairRDD[K, C] = {
implicit val ctag: ClassTag[C] = fakeClassTag
fromRDD(rdd.combineByKey(
createCombiner,
mergeValue,
mergeCombiners,
- partitioner
+ partitioner,
+ mapSideCombine,
+ serializer
))
}
/**
- * Simplified version of combineByKey that hash-partitions the output
RDD.
+ * Generic function to combine the elements for each key using a custom
set of aggregation
+ * functions. Turns a JavaPairRDD[(K, V)] into a result of type
JavaPairRDD[(K, C)], for a
+ * "combined type" C * Note that V and C can be different -- for
example, one might group an
+ * RDD of type (Int, Int) into an RDD of type (Int, List[Int]). Users
provide three
+ * functions:
+ *
+ * - `createCombiner`, which turns a V into a C (e.g., creates a
one-element list)
+ * - `mergeValue`, to merge a V into a C (e.g., adds it to the end of a
list)
+ * - `mergeCombiners`, to combine two C's into a single one.
+ *
+ * In addition, users can control the partitioning of the output RDD.
This method automatically
+ * uses map-side aggregation in shuffling the RDD.
+ */
+ def combineByKey[C](createCombiner: JFunction[V, C],
+ mergeValue: JFunction2[C, V, C],
--- End diff --
4 space indent here
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]