Have you tried using the flatten query as a sub-query? Like:
SELECT x.`household`, x.`minutes`, x.`LDItem`.`upc`, x.`LDItem`.`item_flags` from (SELECT household, minutes, FLATTEN(T.lditem_array.LDItem) as `LDItem` FROM `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8) as x; Chris Matta [email protected] 215-701-3146 On Wed, Sep 30, 2015 at 2:05 PM, Mike Beddo <[email protected]> wrote: > I am querying a Parquet file containing records for nested data: > > SELECT household, minutes, lditem_array FROM > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` limit 1; > > This returns the first record, which contains an array LDItem. For the > first record, LDItem is an array of 8 {} structures), like so: > > 1 046927725027 1135 { > "LDItem" : [ { > "upc" : "0001300000124", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 130, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0003800084505", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 121, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0078533175751", > "item_flags" : 0, > "units" : 1, > "retail" : 369, > "cost" : 233, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0003800084508", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 120, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0001480000373", > "item_flags" : 0, > "units" : 1, > "retail" : 239, > "cost" : 180, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0004200015121", > "item_flags" : 0, > "units" : 1, > "retail" : 299, > "cost" : 185, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 30, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0003582608932", > "item_flags" : 0, > "units" : 1, > "retail" : 138, > "cost" : 79, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > }, { > "upc" : "0003600043220", > "item_flags" : 0, > "units" : 1, > "retail" : 249, > "cost" : 168, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > } ] > } > > I would like to flatten this 1 record into 8 records, with columns for > upc, item_flags, units, retail, cost, ... The furthest I got was with this > query: > > SELECT household, minutes, FLATTEN(T.lditem_array.LDItem) FROM > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8; > > This flattened the first record, but did not yielded new columns upc, > item_flags, units, ... Instead, it gave me one record for each {} element, > which I've not been able to split further into columns: > > 1 046927725027 1135 { > "upc" : "0001300000124", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 130, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > } > 2 046927725027 1135 { > "upc" : "0003800084505", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 121, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > } > 3 046927725027 1135 { > "upc" : "0078533175751", > "item_flags" : 0, > "units" : 1, > "retail" : 369, > "cost" : 233, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > } > 4 046927725027 1135 { > "upc" : "0003800084508", > "item_flags" : 0, > "units" : 1, > "retail" : 199, > "cost" : 120, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > } > 5 046927725027 1135 { > "upc" : "0001480000373", > "item_flags" : 0, > "units" : 1, > "retail" : 239, > "cost" : 180, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > } > 6 046927725027 1135 { > "upc" : "0004200015121", > "item_flags" : 0, > "units" : 1, > "retail" : 299, > "cost" : 185, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 30, > "return_units" : 0, > "return_discount" : 0 > } > 7 046927725027 1135 { > "upc" : "0003582608932", > "item_flags" : 0, > "units" : 1, > "retail" : 138, > "cost" : 79, > "discount_flags" : 0, > "discount_units" : 0, > "discount" : 0, > "return_units" : 0, > "return_discount" : 0 > } > 8 046927725027 1135 { > "upc" : "0003600043220", > "item_flags" : 0, > "units" : 1, > "retail" : 249, > "cost" : 168, > "discount_flags" : 1, > "discount_units" : 1, > "discount" : 49, > "return_units" : 0, > "return_discount" : 0 > } > > How can I get a result set with 12 columns, i.e. split each {...} list > into columns? This is where I am stuck. > > Thanks, > > Mike Beddo > >
