I am looking to do something similar to this Postgres query in HiveQL. If I have a DataFrame student and a DataFrame grade, is this possible?
I read in Learning Spark: Lightning-Fast Big Data Analysis that it should be possible. It says in Chapter 9 "SchemaRDDs can store several basic types, as well as structures and arrays of these types. They use the HiveQL syntax for type definitions. Table-9-1 shown" and goes on to say "The last type, structures, is simply represented as other Rows in Spark SQL. All of these types can also be nested within each other; for example, you can have arrays of structs, or maps that contain structs" Here is the url of the page that describes the Postgres: http://stackoverflow.com/questions/10928210/postgresql-aggregate-array SELECT s.name, array_agg(g.Mark) as marks FROM student s LEFT JOIN Grade g ON g.Student_id = s.Id GROUP BY s.Id Thank you. Richard Catlin