Flatten is the right way to destructure lists which is what you have.  The
proper use of kvgen is for destructuring objects.

For instance, with OpenTSDB you can retrieve rows as maps.  Since the
column names are the keys for this map, kvgen will pull it apart very
nicely.


On Thu, Jan 1, 2015 at 9:28 AM, Vince Gonzalez <[email protected]> wrote:

> Mufy, your suggestion didn't work, but this got me further. Still wondering
> if this is the best way.
>
> 0: jdbc:drill:zk=local> create table spot_price_history as (select
> flatten(SpotPriceHistory) as t from dfs.`/tmp/spot_price_history.json`);
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 0_0        | 638                       |
> +------------+---------------------------+
> 1 row selected (0.353 seconds)
> 0: jdbc:drill:zk=local> select * from spot_price_history limit 5;
> +------------+
> |     t      |
> +------------+
> |
>
> {"ProductDescription":"Linux/UNIX","AvailabilityZone":"us-east-1a","Timestamp":"2015-01-01T13:26:16.000Z","InstanceType":"cc2.8xlarge","SpotPrice":"0.256900"}
> |
> |
>
> {"ProductDescription":"Linux/UNIX","AvailabilityZone":"us-east-1a","Timestamp":"2015-01-01T13:01:38.000Z","InstanceType":"cc2.8xlarge","SpotPrice":"0.256800"}
> |
> |
>
> {"ProductDescription":"Linux/UNIX","AvailabilityZone":"us-east-1a","Timestamp":"2015-01-01T12:21:21.000Z","InstanceType":"cc2.8xlarge","SpotPrice":"0.256900"}
> |
> |
>
> {"ProductDescription":"Linux/UNIX","AvailabilityZone":"us-east-1a","Timestamp":"2015-01-01T12:13:21.000Z","InstanceType":"cc2.8xlarge","SpotPrice":"0.257000"}
> |
> |
>
> {"ProductDescription":"Linux/UNIX","AvailabilityZone":"us-east-1a","Timestamp":"2015-01-01T11:57:05.000Z","InstanceType":"cc2.8xlarge","SpotPrice":"0.256900"}
> |
> +------------+
> 5 rows selected (0.052 seconds)
> 0: jdbc:drill:zk=local> select t['SpotPrice'] from spot_price_history limit
> 5;
> +------------+
> |   EXPR$0   |
> +------------+
> | 0.256900   |
> | 0.256800   |
> | 0.256900   |
> | 0.257000   |
> | 0.256900   |
> +------------+
> 5 rows selected (0.072 seconds)
>
> ---
>  Vince Gonzalez
>  Systems Engineer
>  212.694.3879
>
>  mapr.com
>
> On Thu, Jan 1, 2015 at 10:06 AM, mufy <[email protected]> wrote:
>
> > Have you tried something like -
> >
> > select SpotPriceHistory[0].Timestamp from ...
> >
> >
> > ---
> > Mufeed Usman
> > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> > <http://mufeed.livejournal.com>
> >
> >
> >
> >
> > On Thu, Jan 1, 2015 at 7:25 PM, Vince Gonzalez <[email protected]>
> wrote:
> >
> > > I've got some data with the following structure:
> > >
> > > {
> > >     "SpotPriceHistory": [
> > >         {
> > >             "ProductDescription": "Linux/UNIX",
> > >             "AvailabilityZone": "us-east-1a",
> > >             "Timestamp": "2015-01-01T13:26:16.000Z",
> > >             "InstanceType": "cc2.8xlarge",
> > >             "SpotPrice": "0.256900"
> > >         },
> > >         {
> > >             "ProductDescription": "Linux/UNIX",
> > >             "AvailabilityZone": "us-east-1a",
> > >             "Timestamp": "2015-01-01T13:01:38.000Z",
> > >             "InstanceType": "cc2.8xlarge",
> > >             "SpotPrice": "0.256800"
> > >         },
> > >         ...
> > >     ]
> > > }
> > >
> > > In exploring it with Drill, I'm able to get as far as:
> > >
> > > > select * from (select kvgen(flatten(SpotPriceHistory)) as t from
> > > dfs.`/tmp/spot_price_history.json`);
> > >
> > > Which yields rows that each contain a list of key-value pairs, which
> > seems
> > > like what I want:
> > >
> > > +------------+
> > > |     t      |
> > > +------------+
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:48:46.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.256900"}]
> > > |
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:40:49.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.257000"}]
> > > |
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:36:46.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.257100"}]
> > > |
> > >
> > > But I have not figured out how to pull those keys up into column names
> so
> > > that I can do a query like:
> > >
> > > > select SpotPrice,Timestamp from t
> > >
> > > I'll appreciate any tips.
> > >
> > > Thanks,
> > >
> > > --vince
> > >
> >
>

Reply via email to