Re: Creating RDD from Iterable from groupByKey results

2015-06-16 Thread nir
I updated code sample so people can understand better what are my inputs and
outputs.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-RDD-from-Iterable-from-groupByKey-results-tp23328p23341.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Creating RDD from Iterable from groupByKey results

2015-06-15 Thread Nirav Patel
I am trying to create new RDD based on given PairRDD. I have a PairRDD with
few keys but each keys have large (about 100k) values. I want to somehow
repartition, make each `Iterablev` into RDD[v] so that I can further
apply map, reduce, sortBy etc effectively on those values. I am sensing
flatMapValues is my friend but want to check with other sparkens. This is
for real-time spark app. I have already tried collect() and computing all
measures in-memory of app server but trying to improve upon it.
This is what I try (psuedo)

class ComputeMetrices{
transient JavaSparkContext sparkContext;

public MapString, V computeMetrices(JavaPairRdd javaPairRdd) {

  javaPairRdd.groupByKey(10).mapValues(itr = {
  sparContext.parallelize(list(itr)) //null pointer ; probably at
sparkContext
  })
}
}

I want to create RDD out of that Iterable from groupByKey result so that I
can user further spark transformations.

Thanks
Nir

-- 


[image: What's New with Xactly] http://www.xactlycorp.com/email-click/

[image: Facebook] http://www.facebook.com/XactlyCorp  [image: LinkedIn] 
http://www.linkedin.com/company/xactly-corporation  [image: Twitter] 
https://twitter.com/xactly  [image: YouTube] 
http://www.youtube.com/xactlycorporation