You are looking for LATERAL VIEW explode
in HiveQL.

On Mon, May 4, 2015 at 7:49 AM, Giovanni Paolo Gibilisco <>

> Hi, I'm trying to parse log files generated by Spark using SparkSQL.
> In the JSON elements related to the StageCompleted event we have a nested
> structre containing an array of elements with RDD Info. (see the log below
> as an example (omitting some parts).
> {
>     "Event": "SparkListenerStageCompleted",
>     "Stage Info": {
>       "Stage ID": 1,
>       ...
>       "RDD Info": [
>         {
>           "RDD ID": 5,
>           "Name": "5",
>           "Storage Level": {
>             "Use Disk": false,
>             "Use Memory": false,
>             "Use Tachyon": false,
>             "Deserialized": false,
>             "Replication": 1
>           },
>           "Number of Partitions": 2,
>           "Number of Cached Partitions": 0,
>           "Memory Size": 0,
>           "Tachyon Size": 0,
>           "Disk Size": 0
>         },
> ...
> When i register the log as a table SparkSQL is able to generate the
> correct schema that for the RDD Info element looks like
>  | -- RDD Info: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- Disk Size: long (nullable = true)
>  |    |    |-- Memory Size: long (nullable = true)
>  |    |    |-- Name: string (nullable = true)
> My problem is that if I try to query the table I can only get array
> buffers out of it:
> "SELECT `stageEndInfos.Stage Info.Stage ID`, `stageEndInfos.Stage Info.RDD
> Info` FROM stageEndInfos"
> Stage ID RDD Info
> 1        ArrayBuffer([0,0,...
> 0        ArrayBuffer([0,0,...
> 2        ArrayBuffer([0,0,...
> or:
> "SELECT `stageEndInfos.Stage Info.RDD Info.RDD ID` FROM stageEndInfos"
> ArrayBuffer(5, 4, 3)
> ArrayBuffer(2, 1, 0)
> ArrayBuffer(9, 6,...
> Is there a way to explode the arrays in the rows in order to build a
> single table? (Knowing that the RDD ID is unique and can be used as primary
> key)?
> Thanks!
> How can I get

Reply via email to