Hi, I ran into something similar today but I would settle for a comma separated string of sorted values (so they can be grouped).
I'm spending this evening writing a small UDF for that which you can have/use if that is of any value. Regards, -Stefan On Wed, Sep 30, 2015 at 6:23 PM, Christopher Matta <[email protected]> wrote: > Have you tried using the flatten query as a sub-query? > > Like: > > SELECT x.`household`, x.`minutes`, x.`LDItem`.`upc`, > x.`LDItem`.`item_flags` from (SELECT household, minutes, > FLATTEN(T.lditem_array.LDItem) as `LDItem` FROM > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8) as x; > > > > Chris Matta > [email protected] > 215-701-3146 > > On Wed, Sep 30, 2015 at 2:05 PM, Mike Beddo <[email protected]> > wrote: > > > I am querying a Parquet file containing records for nested data: > > > > SELECT household, minutes, lditem_array FROM > > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` limit 1; > > > > This returns the first record, which contains an array LDItem. For the > > first record, LDItem is an array of 8 {} structures), like so: > > > > 1 046927725027 1135 { > > "LDItem" : [ { > > "upc" : "0001300000124", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 130, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0003800084505", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 121, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0078533175751", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 369, > > "cost" : 233, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0003800084508", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 120, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0001480000373", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 239, > > "cost" : 180, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0004200015121", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 299, > > "cost" : 185, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 30, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0003582608932", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 138, > > "cost" : 79, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > }, { > > "upc" : "0003600043220", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 249, > > "cost" : 168, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > } ] > > } > > > > I would like to flatten this 1 record into 8 records, with columns for > > upc, item_flags, units, retail, cost, ... The furthest I got was with > this > > query: > > > > SELECT household, minutes, FLATTEN(T.lditem_array.LDItem) FROM > > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8; > > > > This flattened the first record, but did not yielded new columns upc, > > item_flags, units, ... Instead, it gave me one record for each {} > element, > > which I've not been able to split further into columns: > > > > 1 046927725027 1135 { > > "upc" : "0001300000124", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 130, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 2 046927725027 1135 { > > "upc" : "0003800084505", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 121, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 3 046927725027 1135 { > > "upc" : "0078533175751", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 369, > > "cost" : 233, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 4 046927725027 1135 { > > "upc" : "0003800084508", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 199, > > "cost" : 120, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 5 046927725027 1135 { > > "upc" : "0001480000373", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 239, > > "cost" : 180, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 6 046927725027 1135 { > > "upc" : "0004200015121", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 299, > > "cost" : 185, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 30, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 7 046927725027 1135 { > > "upc" : "0003582608932", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 138, > > "cost" : 79, > > "discount_flags" : 0, > > "discount_units" : 0, > > "discount" : 0, > > "return_units" : 0, > > "return_discount" : 0 > > } > > 8 046927725027 1135 { > > "upc" : "0003600043220", > > "item_flags" : 0, > > "units" : 1, > > "retail" : 249, > > "cost" : 168, > > "discount_flags" : 1, > > "discount_units" : 1, > > "discount" : 49, > > "return_units" : 0, > > "return_discount" : 0 > > } > > > > How can I get a result set with 12 columns, i.e. split each {...} list > > into columns? This is where I am stuck. > > > > Thanks, > > > > Mike Beddo > > > > >
