RE: Nested DataFrames

Richard Catlin Thu, 25 Jun 2015 17:17:28 -0700

I am looking to do something similar to this Postgres query in HiveQL.  If
I have a DataFrame student and a DataFrame grade, is this possible?


I read in Learning Spark: Lightning-Fast Big Data Analysis that it should
be possible.  It says in Chapter 9
"SchemaRDDs can store several basic types, as well as structures and arrays
of these types.  They use the HiveQL syntax for type definitions.
Table-9-1 shown"
and goes on to say
"The last type, structures, is simply represented as other Rows in Spark
SQL.  All of these types can also be nested within each other; for example,
you can have arrays of structs, or maps that contain structs"

Here is the url of the page that describes the Postgres:
http://stackoverflow.com/questions/10928210/postgresql-aggregate-array

SELECT s.name,  array_agg(g.Mark) as marks

FROM student s

LEFT JOIN Grade g ON g.Student_id = s.Id

GROUP BY s.Id

Thank you.

Richard Catlin

RE: Nested DataFrames

Reply via email to