Hi users, I’m fresh to RDD Programming, my problem as the title, what I do is to read a source file through sc, then do a groupByKey get new RDD,now I want to do the other groupByKey based on the former RDD’s every element.
for example,my source file as follow: hello_world,1 hello,1 hello_world_spark,3 hello_scala,4 spark_rdd,1 spark_rdd_program,1 spark,1 spark_sql,3 after my first round groupbykey, I get an RDD like this: hello,((world,1),(world_spark,3),(scala,4)) spark,((rdd,1),(rdd_program,1),(sql,3)) the next step is what my problem ,I want to groupbykey on the values’ content like “world/rdd/scala/sql”, it seems I need group by every element’s value,but spark does not support nested RDDs, so what can I do to solve it ? actually, what I do is to building a tree, every node is a word in a sentence,the root node is null, in my example , two children of root node is “hello” and “spark”,and hello also has 2 children(world and scala), spark also has two children(rdd and sql) For Help Please. Thanks every every every much. Mars. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org