Hi users,

        I’m fresh to RDD Programming, my problem as the title, what I do is to 
read a source file through sc, then do a groupByKey get new RDD,now I want to 
do the other groupByKey based on the former RDD’s every element.

       for example,my source file as follow:
            hello_world,1
            hello,1
            hello_world_spark,3
            hello_scala,4
            spark_rdd,1
            spark_rdd_program,1
            spark,1
            spark_sql,3
           
            
      after my first round groupbykey, I get an RDD like this:
           hello,((world,1),(world_spark,3),(scala,4))
           spark,((rdd,1),(rdd_program,1),(sql,3))
            
  
      the next step is what my problem ,I want to groupbykey on the values’ 
content like “world/rdd/scala/sql”, it seems I need group by every element’s 
value,but spark does not support nested RDDs, so what can I do to solve it ?

      actually, what I do is to building a tree, every node is a word in a 
sentence,the root node is null, in my example , two children of root node is 
“hello” and “spark”,and hello also has 2 children(world and scala), spark also 
has two children(rdd and sql)

For Help Please. 
Thanks every every every much.

Mars.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to