Re: JSON File, Total numbers Record: 1

Paolo Spanevello Wed, 13 Jan 2016 06:56:45 -0800

Hi,

thanks again Jason for your support.


I'm trying as u suggest, to apply the kvgen function. Below you can find
the query and the error.

*select tt.flat_intervals from (select
flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*

[image: Immagine incorporata 1]


Could u pls suggest how to fix this drill?

Best regards,
Paolo



2016-01-12 16:31 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>:

> This is a poor error messages that is produced when you try to flatten a
> field that is not an array, for these fields you can just use the dot
> notation to access their inner members (i.e.
> flattened_array_of_maps.member_field_in_map). If you have a field where the
> keys in a map are "unknown" or you want to do analysis on the keys, please
> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
> issue a while ago to myself and will work to improve the message [2].
>
>
> [1] - https://drill.apache.org/docs/kvgen/
> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>
> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <paolosp...@gmail.com>
> wrote:
>
>> Hi  All,
>>
>> Jason,I used your suggests and it works, thanks a lot!
>>
>> As u wrote i used a subquery to have the all list of INTERVALS as I show
>> below.
>>
>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>
>> [image: Immagine incorporata 1]
>>
>> The attribute "flat_intervals" is full of data that I would like to have
>> separate them in several attributes as it is showed in the link:
>> https://drill.apache.org/docs/flatten/
>> I used a new subquery to have them:
>>
>>
>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>>
>> but I have this error:
>>
>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
>> [30027]Query execution error. Details:[
>> SYSTEM ERROR: ClassCastException: Cannot cast
>> org.apache.drill.exec.vector.complex.MapVector to
>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>
>> Fragment 0:0
>>
>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
>> ]
>>
>> [image: Immagine incorporata 2]
>>
>> Thanks in advance!
>>
>> Paolo
>>
>>
>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>:
>>
>>> Paolo,
>>>
>>> Drill currently reads single JSON objects as single records. If you look
>>> at
>>> the top of your file you can see that the root of your document is a
>>> single
>>> JSON object.
>>>
>>> Drill accepts two formats for individual records:
>>>
>>> The Mongo import format, a series of JSON object one after the other in a
>>> file, whitespace is irrelevant, each one need not be followed by a
>>> newline
>>>
>>> {"a" : 1, "b" : "hello" }
>>> {"a": 5 : "b" : "guten tag" }
>>>
>>> A JSON array of objects
>>>
>>> [
>>>     {"a" : 1, "b" : "hello" },
>>>     {"a" : 2, "b", "guten tag"}
>>> ]
>>>
>>> When you have a file like this you can read it using the FLATTEN
>>> functionality of Drill to turn an array into a series of records.
>>> https://drill.apache.org/docs/flatten/
>>>
>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>>
>>> To work with the data further, you can put the flatten call in a
>>> subquery.
>>> Here is how you can select the first element from each records list of
>>> INTERVALS and select one of the nested fields inside of METRICS once the
>>> data has been flattened.
>>> To analyze the array, you could flatten again to get an exploded dataset
>>> with one record per interval across all records
>>>
>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as
>>> t;
>>>
>>> Here you can see that individual columns can be selected next to the
>>> flatten call, this will copy the data into each new record:
>>>
>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>>
>>> Happy Drilling!
>>>
>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <paolosp...@gmail.com>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > i'm trying to query the file that you can find in attach with drill
>>> apache
>>> > 1.4 . The result of this qurey is always 1 record.
>>> >
>>> > The query that i'm running is :
>>> >
>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>> >
>>> > If i run the similar query with the file donuts.json found on
>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>>> properly.
>>> >
>>> > SELECT t.topping FROM donuts.json t
>>> >
>>> > Thanks in advance.
>>> >
>>> > Paolo
>>> >
>>>
>>
>>
>

Re: JSON File, Total numbers Record: 1

Reply via email to