Hi, thanks again Jason for your support.
I'm trying as u suggest, to apply the kvgen function. Below you can find the query and the error. *select tt.flat_intervals from (select flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt* [image: Immagine incorporata 1] Could u pls suggest how to fix this drill? Best regards, Paolo 2016-01-12 16:31 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>: > This is a poor error messages that is produced when you try to flatten a > field that is not an array, for these fields you can just use the dot > notation to access their inner members (i.e. > flattened_array_of_maps.member_field_in_map). If you have a field where the > keys in a map are "unknown" or you want to do analysis on the keys, please > refer to the KVGEN docs [1]. I have assigned the JIRA that reported this > issue a while ago to myself and will work to improve the message [2]. > > > [1] - https://drill.apache.org/docs/kvgen/ > [2] - https://issues.apache.org/jira/browse/DRILL-2182 > > On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <paolosp...@gmail.com> > wrote: > >> Hi All, >> >> Jason,I used your suggests and it works, thanks a lot! >> >> As u wrote i used a subquery to have the all list of INTERVALS as I show >> below. >> >> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t* >> >> [image: Immagine incorporata 1] >> >> The attribute "flat_intervals" is full of data that I would like to have >> separate them in several attributes as it is showed in the link: >> https://drill.apache.org/docs/flatten/ >> I used a new subquery to have them: >> >> >> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from >> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt* >> >> but I have this error: >> >> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: >> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from >> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt >> [30027]Query execution error. Details:[ >> SYSTEM ERROR: ClassCastException: Cannot cast >> org.apache.drill.exec.vector.complex.MapVector to >> org.apache.drill.exec.vector.complex.RepeatedValueVector >> >> Fragment 0:0 >> >> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010] >> ] >> >> [image: Immagine incorporata 2] >> >> Thanks in advance! >> >> Paolo >> >> >> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>: >> >>> Paolo, >>> >>> Drill currently reads single JSON objects as single records. If you look >>> at >>> the top of your file you can see that the root of your document is a >>> single >>> JSON object. >>> >>> Drill accepts two formats for individual records: >>> >>> The Mongo import format, a series of JSON object one after the other in a >>> file, whitespace is irrelevant, each one need not be followed by a >>> newline >>> >>> {"a" : 1, "b" : "hello" } >>> {"a": 5 : "b" : "guten tag" } >>> >>> A JSON array of objects >>> >>> [ >>> {"a" : 1, "b" : "hello" }, >>> {"a" : 2, "b", "guten tag"} >>> ] >>> >>> When you have a file like this you can read it using the FLATTEN >>> functionality of Drill to turn an array into a series of records. >>> https://drill.apache.org/docs/flatten/ >>> >>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`; >>> >>> To work with the data further, you can put the flatten call in a >>> subquery. >>> Here is how you can select the first element from each records list of >>> INTERVALS and select one of the nested fields inside of METRICS once the >>> data has been flattened. >>> To analyze the array, you could flatten again to get an exploded dataset >>> with one record per interval across all records >>> >>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low >>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as >>> t; >>> >>> Here you can see that individual columns can be selected next to the >>> flatten call, this will copy the data into each new record: >>> >>> select flatten(t.flat_rides.INTERVALS) as flat_intervals, >>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as >>> flat_rides from dfs.tmp.`rideDB.json`) as t; >>> >>> Happy Drilling! >>> >>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <paolosp...@gmail.com> >>> wrote: >>> >>> > Hi all, >>> > >>> > i'm trying to query the file that you can find in attach with drill >>> apache >>> > 1.4 . The result of this qurey is always 1 record. >>> > >>> > The query that i'm running is : >>> > >>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t >>> > >>> > If i run the similar query with the file donuts.json found on >>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs >>> properly. >>> > >>> > SELECT t.topping FROM donuts.json t >>> > >>> > Thanks in advance. >>> > >>> > Paolo >>> > >>> >> >> >