Have you tried using the flatten query as a sub-query?

Like:

SELECT x.`household`, x.`minutes`, x.`LDItem`.`upc`,
x.`LDItem`.`item_flags` from (SELECT household, minutes,
FLATTEN(T.lditem_array.LDItem) as `LDItem` FROM
`dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8) as x;

​

Chris Matta
[email protected]
215-701-3146

On Wed, Sep 30, 2015 at 2:05 PM, Mike Beddo <[email protected]>
wrote:

> I am querying a Parquet file containing records for nested data:
>
> SELECT household, minutes, lditem_array FROM
> `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` limit 1;
>
> This returns the first record, which contains an array LDItem. For the
> first record, LDItem is an array of 8 {} structures), like so:
>
> 1       046927725027    1135    {
>   "LDItem" : [ {
>     "upc" : "0001300000124",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 199,
>     "cost" : 130,
>     "discount_flags" : 0,
>     "discount_units" : 0,
>     "discount" : 0,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0003800084505",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 199,
>     "cost" : 121,
>     "discount_flags" : 1,
>     "discount_units" : 1,
>     "discount" : 49,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0078533175751",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 369,
>     "cost" : 233,
>     "discount_flags" : 0,
>     "discount_units" : 0,
>     "discount" : 0,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0003800084508",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 199,
>     "cost" : 120,
>     "discount_flags" : 1,
>     "discount_units" : 1,
>     "discount" : 49,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0001480000373",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 239,
>     "cost" : 180,
>     "discount_flags" : 0,
>     "discount_units" : 0,
>     "discount" : 0,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0004200015121",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 299,
>     "cost" : 185,
>     "discount_flags" : 1,
>     "discount_units" : 1,
>     "discount" : 30,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0003582608932",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 138,
>     "cost" : 79,
>     "discount_flags" : 0,
>     "discount_units" : 0,
>     "discount" : 0,
>     "return_units" : 0,
>     "return_discount" : 0
>   }, {
>     "upc" : "0003600043220",
>     "item_flags" : 0,
>     "units" : 1,
>     "retail" : 249,
>     "cost" : 168,
>     "discount_flags" : 1,
>     "discount_units" : 1,
>     "discount" : 49,
>     "return_units" : 0,
>     "return_discount" : 0
>   } ]
> }
>
> I would like to flatten this 1 record into 8 records, with columns for
> upc, item_flags, units, retail, cost, ... The furthest I got was with this
> query:
>
> SELECT household, minutes, FLATTEN(T.lditem_array.LDItem) FROM
> `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8;
>
> This flattened the first record, but did not yielded new columns upc,
> item_flags, units, ... Instead, it gave me one record for each {} element,
> which I've not been able to split further into columns:
>
> 1       046927725027    1135    {
>   "upc" : "0001300000124",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 199,
>   "cost" : 130,
>   "discount_flags" : 0,
>   "discount_units" : 0,
>   "discount" : 0,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 2       046927725027    1135    {
>   "upc" : "0003800084505",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 199,
>   "cost" : 121,
>   "discount_flags" : 1,
>   "discount_units" : 1,
>   "discount" : 49,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 3       046927725027    1135    {
>   "upc" : "0078533175751",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 369,
>   "cost" : 233,
>   "discount_flags" : 0,
>   "discount_units" : 0,
>   "discount" : 0,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 4       046927725027    1135    {
>   "upc" : "0003800084508",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 199,
>   "cost" : 120,
>   "discount_flags" : 1,
>   "discount_units" : 1,
>   "discount" : 49,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 5       046927725027    1135    {
>   "upc" : "0001480000373",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 239,
>   "cost" : 180,
>   "discount_flags" : 0,
>   "discount_units" : 0,
>   "discount" : 0,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 6       046927725027    1135    {
>   "upc" : "0004200015121",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 299,
>   "cost" : 185,
>   "discount_flags" : 1,
>   "discount_units" : 1,
>   "discount" : 30,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 7       046927725027    1135    {
>   "upc" : "0003582608932",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 138,
>   "cost" : 79,
>   "discount_flags" : 0,
>   "discount_units" : 0,
>   "discount" : 0,
>   "return_units" : 0,
>   "return_discount" : 0
> }
> 8       046927725027    1135    {
>   "upc" : "0003600043220",
>   "item_flags" : 0,
>   "units" : 1,
>   "retail" : 249,
>   "cost" : 168,
>   "discount_flags" : 1,
>   "discount_units" : 1,
>   "discount" : 49,
>   "return_units" : 0,
>   "return_discount" : 0
> }
>
> How can I get a result set with 12 columns, i.e. split each {...} list
> into columns? This is where I am stuck.
>
> Thanks,
>
> Mike Beddo
>
>

Reply via email to