I am trying to create new RDD based on given PairRDD. I have a PairRDD with
few keys but each keys have large (about 100k) values. I want to somehow
repartition, make each `Iterable<v>` into RDD[v] so that I can further
apply map, reduce, sortBy etc effectively on those values. I am sensing
flatMapValues is my friend but want to check with other sparkens. This is
for real-time spark app. I have already tried collect() and computing all
measures in-memory of app server but trying to improve upon it.
This is what I try (psuedo)
class ComputeMetrices{
transient JavaSparkContext sparkContext;
public Map<String, V> computeMetrices(JavaPairRdd javaPairRdd) {
javaPairRdd.groupByKey(10).mapValues(itr => {
sparContext.parallelize(list(itr)) //null pointer ; probably at
sparkContext
})
}
}
I want to create RDD out of that Iterable from groupByKey result so that I
can user further spark transformations.
Thanks
Nir
--
[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
[image: Facebook] <http://www.facebook.com/XactlyCorp> [image: LinkedIn]
<http://www.linkedin.com/company/xactly-corporation> [image: Twitter]
<https://twitter.com/xactly> [image: YouTube]
<http://www.youtube.com/xactlycorporation>