I am exploring Spark SQL and Dataframe and trying to create an aggregration by column and generate a single json row with aggregation. Any inputs on the right approach will be helpful.
Here is my sample data user,sports,major,league,count [test1,Sports,Switzerland,NLA,6] [test1,Football,Australia,A-League,6] [test1,Ice Hockey,Sweden,SHL,3] [test1,Ice Hockey,Switzerland,NLB,2] [test1,Football,Romania,Liga I,1] I want to aggregate by user and create a single json row. { user : test1 , sports : [ { "Ice Hockey" : 11, "Football" : 7 }] , major : [ {"Switzerland" : 8, "Australia" :6 , "Sweden" : 3 , "Romania" :1 }] ,league : [ "NLA" : 6 , "A-League" : 6 , "SHL" :3 , "NLB" :2 , "Liga I" : 1] , total : 18} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Aggregation-by-column-and-generating-a-json-tp22562.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org