Re: JSON File, Total numbers Record: 1

Jason Altekruse Wed, 13 Jan 2016 15:29:32 -0800

The issue you have now is that you are passing an array into KVGEN, which
only works on maps. You can just flatten it again instead.


Looking more closely at your dataset, I don't think you have any maps that
would require KVGEN to analyze them, they all have keys that appear in all
records, KVGEN is just for sparse maps. An example would be if you put user
ID as the key, in which case you wouldn't want to create thousands of
different SQL statements (using each user_id as a column name) to analyze
different users.

You should just use the dot notation to access nested members of one of the
maps after you've flattened down to the correct level.

Example (note the backticks around `start` as it is a SQL reserved word):
select t2.flat_intervals.name, t2.flat_intervals.`start`,
t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;

I also noticed that Metrics appears at two different levels, so here are
two examples for reaching both levels:

select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
flat_rides from dfs.tmp.`rideDB.json`) as t
+------------+
|   EXPR$0   |
+------------+
| 2.42418    |
| -13.54943  |
| 0.83858    |
| null       |
+------------+

This query also shows how you can include a field next to the array when
flattening and bring it up the the outer query if you still need this data
associated with the flattened data (note `date` is also a reserved word).

select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, t.flat_rides.`date`
as ride_date from (select flatten(rides) as flat_rides from
dfs.tmp.`rideDB.json`) as t) as t2;

+-------------------+--------------------------+
| skiba_wprime_low  |        ride_date         |
+-------------------+--------------------------+
| 2.42418           | 2014/12/02 10:10:23 UTC  |
| 1.95597           | 2014/12/02 10:10:23 UTC  |
| 1.90420           | 2014/12/02 10:10:23 UTC  |
| 1.72623           | 2014/12/02 10:10:23 UTC  |
| 1.04275           | 2014/12/02 10:10:23 UTC  |
| 1.55861           | 2014/12/02 10:10:23 UTC  |
| 17.73633          | 2014/12/02 10:10:23 UTC  |
| 17.31987          | 2014/12/02 10:10:23 UTC  |
| 18.47823          | 2014/12/02 10:10:23 UTC  |
... data clipped


On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <[email protected]>
wrote:

> Hi,
>
> thanks to all for your support.
>
> I'm trying as u suggest, to apply the kvgen function. Below you can find
> the query and the error.
>
> *select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
>
> I have this error message.
>
> Best regards,
> Paolo
>
> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
> [30027]Query execution error. Details:[
> SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
> maps as input
>
> Fragment 0:0
>
> [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
> ]
>
> 2016-01-13 20:28 GMT+01:00 Jason Altekruse <[email protected]>:
>
>> Wasn't thinking about that, I was seeing the images because I was included
>> directly in the recipients lists.
>>
>> +1 on text for searchability.
>>
>> Can you please rerun the queries and post the errors here as text?
>>
>> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <[email protected]>
>> wrote:
>>
>> > The apache mailing lists don't support image attachments, it would be
>> > better to attach text anyway since it's searchable.
>> >
>> > Chris Matta
>> > [email protected]
>> > 215-701-3146
>> >
>> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <[email protected]
>> >
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> thanks again Jason for your support.
>> >>
>> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>> find
>> >> the query and the error.
>> >>
>> >> *select tt.flat_intervals from (select
>> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>> tt*
>> >>
>> >> [image: Immagine incorporata 1]
>> >>
>> >>
>> >> Could u pls suggest how to fix this drill?
>> >>
>> >> Best regards,
>> >> Paolo
>> >>
>> >>
>> >>
>> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <[email protected]>:
>> >>
>> >>> This is a poor error messages that is produced when you try to
>> flatten a
>> >>> field that is not an array, for these fields you can just use the dot
>> >>> notation to access their inner members (i.e.
>> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>> where the
>> >>> keys in a map are "unknown" or you want to do analysis on the keys,
>> please
>> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
>> this
>> >>> issue a while ago to myself and will work to improve the message [2].
>> >>>
>> >>>
>> >>> [1] - https://drill.apache.org/docs/kvgen/
>> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>> >>>
>> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>> [email protected]>
>> >>> wrote:
>> >>>
>> >>>> Hi  All,
>> >>>>
>> >>>> Jason,I used your suggests and it works, thanks a lot!
>> >>>>
>> >>>> As u wrote i used a subquery to have the all list of INTERVALS as I
>> >>>> show below.
>> >>>>
>> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>> >>>>
>> >>>> [image: Immagine incorporata 1]
>> >>>>
>> >>>> The attribute "flat_intervals" is full of data that I would like to
>> >>>> have separate them in several attributes as it is showed in the link:
>> >>>> https://drill.apache.org/docs/flatten/
>> >>>> I used a new subquery to have them:
>> >>>>
>> >>>>
>> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> from
>> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
>> tt*
>>
>> >>>>
>> >>>> but I have this error:
>> >>>>
>> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
>> tt
>> >>>> [30027]Query execution error. Details:[
>> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>> >>>> org.apache.drill.exec.vector.complex.MapVector to
>> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> >>>>
>> >>>> Fragment 0:0
>> >>>>
>> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>> 192.168.1.101:31010]
>> >>>> ]
>> >>>>
>> >>>> [image: Immagine incorporata 2]
>> >>>>
>> >>>> Thanks in advance!
>> >>>>
>> >>>> Paolo
>> >>>>
>> >>>>
>> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <[email protected]
>> >:
>> >>>>
>> >>>>> Paolo,
>> >>>>>
>> >>>>> Drill currently reads single JSON objects as single records. If you
>> >>>>> look at
>> >>>>> the top of your file you can see that the root of your document is a
>> >>>>> single
>> >>>>> JSON object.
>> >>>>>
>> >>>>> Drill accepts two formats for individual records:
>> >>>>>
>> >>>>> The Mongo import format, a series of JSON object one after the other
>> >>>>> in a
>> >>>>> file, whitespace is irrelevant, each one need not be followed by a
>> >>>>> newline
>> >>>>>
>> >>>>> {"a" : 1, "b" : "hello" }
>> >>>>> {"a": 5 : "b" : "guten tag" }
>> >>>>>
>> >>>>> A JSON array of objects
>> >>>>>
>> >>>>> [
>> >>>>>     {"a" : 1, "b" : "hello" },
>> >>>>>     {"a" : 2, "b", "guten tag"}
>> >>>>> ]
>> >>>>>
>> >>>>> When you have a file like this you can read it using the FLATTEN
>> >>>>> functionality of Drill to turn an array into a series of records.
>> >>>>> https://drill.apache.org/docs/flatten/
>> >>>>>
>> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>> >>>>>
>> >>>>> To work with the data further, you can put the flatten call in a
>> >>>>> subquery.
>> >>>>> Here is how you can select the first element from each records list
>> of
>> >>>>> INTERVALS and select one of the nested fields inside of METRICS once
>> >>>>> the
>> >>>>> data has been flattened.
>> >>>>> To analyze the array, you could flatten again to get an exploded
>> >>>>> dataset
>> >>>>> with one record per interval across all records
>> >>>>>
>> >>>>> select t.flat_rides.INTERVALS[0],
>> t.flat_rides.METRICS.skiba_wprime_low
>> >>>>> from (select flatten(rides) as flat_rides from
>> dfs.tmp.`rideDB.json`)
>> >>>>> as t;
>> >>>>>
>> >>>>> Here you can see that individual columns can be selected next to the
>> >>>>> flatten call, this will copy the data into each new record:
>> >>>>>
>> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>> >>>>>
>> >>>>> Happy Drilling!
>> >>>>>
>> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>> >>>>> [email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>> > Hi all,
>> >>>>> >
>> >>>>> > i'm trying to query the file that you can find in attach with
>> drill
>> >>>>> apache
>> >>>>> > 1.4 . The result of this qurey is always 1 record.
>> >>>>> >
>> >>>>> > The query that i'm running is :
>> >>>>> >
>> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >>>>> >
>> >>>>> > If i run the similar query with the file donuts.json found on
>> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>> >>>>> properly.
>> >>>>> >
>> >>>>> > SELECT t.topping FROM donuts.json t
>> >>>>> >
>> >>>>> > Thanks in advance.
>> >>>>> >
>> >>>>> > Paolo
>> >>>>> >
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Reply via email to