Hi,
Sorry for the interruption. It did help me with the query with the json
format but in case of xml data can apache Drill query xml data directly. My
use case is get xml data running into terabytes , convert into json and
store it in Mongodb. Can I avoid this converting xml to json step as the
xml data is in terabytes. Any suggestion would be really helpful.

On Thu, Jan 1, 2015 at 10:04 PM, Vince Gonzalez <[email protected]> wrote:

> I'll give that a shot, thanks.
>
> On Thursday, January 1, 2015, mufy <[email protected]> wrote:
>
> > Have you tried something like -
> >
> > select SpotPriceHistory[0].Timestamp from ...
> >
> >
> > ---
> > Mufeed Usman
> > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> > <http://mufeed.livejournal.com>
> >
> >
> >
> >
> > On Thu, Jan 1, 2015 at 7:25 PM, Vince Gonzalez <[email protected]
> > <javascript:;>> wrote:
> >
> > > I've got some data with the following structure:
> > >
> > > {
> > >     "SpotPriceHistory": [
> > >         {
> > >             "ProductDescription": "Linux/UNIX",
> > >             "AvailabilityZone": "us-east-1a",
> > >             "Timestamp": "2015-01-01T13:26:16.000Z",
> > >             "InstanceType": "cc2.8xlarge",
> > >             "SpotPrice": "0.256900"
> > >         },
> > >         {
> > >             "ProductDescription": "Linux/UNIX",
> > >             "AvailabilityZone": "us-east-1a",
> > >             "Timestamp": "2015-01-01T13:01:38.000Z",
> > >             "InstanceType": "cc2.8xlarge",
> > >             "SpotPrice": "0.256800"
> > >         },
> > >         ...
> > >     ]
> > > }
> > >
> > > In exploring it with Drill, I'm able to get as far as:
> > >
> > > > select * from (select kvgen(flatten(SpotPriceHistory)) as t from
> > > dfs.`/tmp/spot_price_history.json`);
> > >
> > > Which yields rows that each contain a list of key-value pairs, which
> > seems
> > > like what I want:
> > >
> > > +------------+
> > > |     t      |
> > > +------------+
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:48:46.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.256900"}]
> > > |
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:40:49.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.257000"}]
> > > |
> > > |
> > >
> > >
> >
> [{"key":"ProductDescription","value":"Linux/UNIX"},{"key":"AvailabilityZone","value":"us-east-1a"},{"key":"Timestamp","value":"2014-12-20T11:36:46.000Z"},{"key":"InstanceType","value":"cc2.8xlarge"},{"key":"SpotPrice","value":"0.257100"}]
> > > |
> > >
> > > But I have not figured out how to pull those keys up into column names
> so
> > > that I can do a query like:
> > >
> > > > select SpotPrice,Timestamp from t
> > >
> > > I'll appreciate any tips.
> > >
> > > Thanks,
> > >
> > > --vince
> > >
> >
>
>
> --
>
> ---
>  Vince Gonzalez
>  Systems Engineer
>  212.694.3879
>
>  mapr.com
>

Reply via email to