Have you looked at the `aggregate` function in the RDD API ?
If your way of extracting the “key” (identifier) and “value” (payload) parts of the RDD elements is uniform (a function), it’s unclear to me how this would be more efficient that extracting key and value and then using combine, however. — FG On Tue, Jan 27, 2015 at 10:17 PM, Mohit Jaggi <mohitja...@gmail.com> wrote: > Hi All, > I have a use case where I have an RDD (not a k,v pair) where I want to do a > combineByKey() operation. I can do that by creating an intermediate RDD of > k,v pairs and using PairRDDFunctions.combineByKey(). However, I believe it > will be more efficient if I can avoid this intermediate RDD. Is there a way I > can do this by passing in a function that extracts the key, like in > RDD.groupBy()? [oops, RDD.groupBy seems to create the intermediate RDD > anyway, maybe a better implementation is possible for that too?] > If not, is it worth adding to the Spark API? > Mohit. > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org