I'm currently new to pyspark, thank you for your patience in advance - my current problem is the following:
I have a RDD composed of the field A, B, and count => result1 = rdd.map(lambda x: (A,B),1).reduceByKey(lambda a,b: a + b) Then I wanted to group the results based on 'A', so I did the following => result2 = result1.map(lambda x: (x[0][0],(x[0][1],x[1]))).groupByKey() Now, my problem/challenge, with the new RDD <A, (B,count)>, I want to be able to "subsort" and take the top 50 elements in (B, count) in descending order and thereafter print or save the top(50) every element/instance of A. i.e. final result => A, B1, 40 A, B2, 30 A, B3, 20 A, B4, 10 A1, C1,30 A1, C2, 20 A1, C3, 10 Any guidance and help you can provide to help me solve this problem of mine is much appreciated! Thank you :-) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GroupBy-and-nested-Top-on-tp20648.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org