Hi,

I ran into something similar today but I would settle for a comma separated
string of sorted values (so they can be grouped).

I'm spending this evening writing a small UDF for that which you can
have/use if that is of any value.

Regards,
 -Stefan


On Wed, Sep 30, 2015 at 6:23 PM, Christopher Matta <[email protected]> wrote:

> Have you tried using the flatten query as a sub-query?
>
> Like:
>
> SELECT x.`household`, x.`minutes`, x.`LDItem`.`upc`,
> x.`LDItem`.`item_flags` from (SELECT household, minutes,
> FLATTEN(T.lditem_array.LDItem) as `LDItem` FROM
> `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8) as x;
>
> ​
>
> Chris Matta
> [email protected]
> 215-701-3146
>
> On Wed, Sep 30, 2015 at 2:05 PM, Mike Beddo <[email protected]>
> wrote:
>
> > I am querying a Parquet file containing records for nested data:
> >
> > SELECT household, minutes, lditem_array FROM
> > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` limit 1;
> >
> > This returns the first record, which contains an array LDItem. For the
> > first record, LDItem is an array of 8 {} structures), like so:
> >
> > 1       046927725027    1135    {
> >   "LDItem" : [ {
> >     "upc" : "0001300000124",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 199,
> >     "cost" : 130,
> >     "discount_flags" : 0,
> >     "discount_units" : 0,
> >     "discount" : 0,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0003800084505",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 199,
> >     "cost" : 121,
> >     "discount_flags" : 1,
> >     "discount_units" : 1,
> >     "discount" : 49,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0078533175751",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 369,
> >     "cost" : 233,
> >     "discount_flags" : 0,
> >     "discount_units" : 0,
> >     "discount" : 0,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0003800084508",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 199,
> >     "cost" : 120,
> >     "discount_flags" : 1,
> >     "discount_units" : 1,
> >     "discount" : 49,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0001480000373",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 239,
> >     "cost" : 180,
> >     "discount_flags" : 0,
> >     "discount_units" : 0,
> >     "discount" : 0,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0004200015121",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 299,
> >     "cost" : 185,
> >     "discount_flags" : 1,
> >     "discount_units" : 1,
> >     "discount" : 30,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0003582608932",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 138,
> >     "cost" : 79,
> >     "discount_flags" : 0,
> >     "discount_units" : 0,
> >     "discount" : 0,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   }, {
> >     "upc" : "0003600043220",
> >     "item_flags" : 0,
> >     "units" : 1,
> >     "retail" : 249,
> >     "cost" : 168,
> >     "discount_flags" : 1,
> >     "discount_units" : 1,
> >     "discount" : 49,
> >     "return_units" : 0,
> >     "return_discount" : 0
> >   } ]
> > }
> >
> > I would like to flatten this 1 record into 8 records, with columns for
> > upc, item_flags, units, retail, cost, ... The furthest I got was with
> this
> > query:
> >
> > SELECT household, minutes, FLATTEN(T.lditem_array.LDItem) FROM
> > `dfs`.`root`.`./S/Beddo/spark/pig-xpdb` T limit 8;
> >
> > This flattened the first record, but did not yielded new columns upc,
> > item_flags, units, ... Instead, it gave me one record for each {}
> element,
> > which I've not been able to split further into columns:
> >
> > 1       046927725027    1135    {
> >   "upc" : "0001300000124",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 199,
> >   "cost" : 130,
> >   "discount_flags" : 0,
> >   "discount_units" : 0,
> >   "discount" : 0,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 2       046927725027    1135    {
> >   "upc" : "0003800084505",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 199,
> >   "cost" : 121,
> >   "discount_flags" : 1,
> >   "discount_units" : 1,
> >   "discount" : 49,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 3       046927725027    1135    {
> >   "upc" : "0078533175751",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 369,
> >   "cost" : 233,
> >   "discount_flags" : 0,
> >   "discount_units" : 0,
> >   "discount" : 0,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 4       046927725027    1135    {
> >   "upc" : "0003800084508",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 199,
> >   "cost" : 120,
> >   "discount_flags" : 1,
> >   "discount_units" : 1,
> >   "discount" : 49,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 5       046927725027    1135    {
> >   "upc" : "0001480000373",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 239,
> >   "cost" : 180,
> >   "discount_flags" : 0,
> >   "discount_units" : 0,
> >   "discount" : 0,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 6       046927725027    1135    {
> >   "upc" : "0004200015121",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 299,
> >   "cost" : 185,
> >   "discount_flags" : 1,
> >   "discount_units" : 1,
> >   "discount" : 30,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 7       046927725027    1135    {
> >   "upc" : "0003582608932",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 138,
> >   "cost" : 79,
> >   "discount_flags" : 0,
> >   "discount_units" : 0,
> >   "discount" : 0,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> > 8       046927725027    1135    {
> >   "upc" : "0003600043220",
> >   "item_flags" : 0,
> >   "units" : 1,
> >   "retail" : 249,
> >   "cost" : 168,
> >   "discount_flags" : 1,
> >   "discount_units" : 1,
> >   "discount" : 49,
> >   "return_units" : 0,
> >   "return_discount" : 0
> > }
> >
> > How can I get a result set with 12 columns, i.e. split each {...} list
> > into columns? This is where I am stuck.
> >
> > Thanks,
> >
> > Mike Beddo
> >
> >
>

Reply via email to