Hi Jason, You are right, it probably makes sense to introduce such a concept. We could be free from the DataSet.
Could I support or apply some requirement on the web? Best, Paolo 2016-01-14 17:36 GMT+01:00 Jason Altekruse <[email protected]>: > You can use * to propagate all traits up. The only problem with this is > that it will pull up a copy of the list you are flattening as well (copying > the complete list next to each record in the flattened out version of the > list). Databases have always frustratingly lacked a feature of 'select * > (excluding the large field I don't need to read, that just strains the > database because someone thought we should put binaries in the table)'. > Considering the liberal types of data Drill can query, it probably makes > sense to introduce such a concept. > > I'm going to create a few views to make this easier to process. Note to > create view you need to be in a writable workspace. By default the dfs.tmp > workspace is the only writable one, see this page for configuring a more > permanent workspace to store your views in > https://drill.apache.org/docs/workspaces/ > > Here is a simple view that can copy up all of the fields, although it will > contain extra copies of the first list level that was flattened. > > create or replace view ride_db_view as select > flatten(t.flat_rides.INTERVALS) as flat_intervals, t.* from (select > version,flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t; > > Looking at the data is kind of hard with the size of the records, but I > tried to write a few queries to illustrate this point. > > Here is a simple select out of the view above, selecting out of the list > of flattened records in the outermost query, it returns 80 records: > select t.flat_intervals as first_int from ride_db_view as t; > > You can also grab the first element out of the un_flattened version of the > same list out of the view, this however is confusing because it still > returns 80 records, because flat_rides now has been copied next to each > flattened interval, so a bunch of these are repeats: > select t.flat_rides.intervals[0] as first_int from ride_db_view as t; > > Here you can filter the results to find where the first element in the > inner list matches outermost flattened one, this gets you back to the size > of the root level list of rides, Drill currently doesn't implement map > equality so I used the combination of the start and stop time to compare > the maps: > select * from (select t.flat_rides.intervals[0] as first_int, > t.flat_intervals from ride_db_view as t) as t2 where t2.first_int.`start` = > t2.flat_intervals.`start` AND t2.first_int.stop = t2.flat_intervals.stop; > > Thankfully for you, your dataset is well structured to avoid most of this > headache, as a bunch of your fields are nested into maps, and Drill allows > you to select the whole map by name to move it up to the next outer query. > > Here is a little bit longer query that selects all of the data, without > keeping around extra copies of the lists, you only have to enumerate the > few fields at the top of each "RIDE" record. > create or replace view ride_db_view as select > flatten(t.flat_rides.INTERVALS) as flat_intervals, t.flat_rides.METRICS, > t.flat_rides.TAGS,t.flat_rides.`filename`, t.flat_rides.`date`, > t.flat_rides.`fingerprint`, t.flat_rides.`crc`, t.flat_rides.`metacrc`, > t.flat_rides. `timestamp`, t.flat_rides.`dbversion`, > t.flat_rides.`color`,t.flat_rides.`present`,t.flat_rides.`isRun`,t.flat_rides.`isSwim`,t.flat_rides.`weight`,t.flat_rides.`samples` > from (select version,flatten(rides) as flat_rides from > dfs.tmp.`rideDB.json`) as t; > > Here you can see that flat_rides is no longer available: > >> select t.flat_rides.intervals[0] as first_int from ride_db_view as t; > Jan 14, 2016 8:22:42 AM > org.apache.calcite.sql.validate.SqlValidatorException <init> > SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column > 'flat_rides' not found in table 't' > > Here is what the output data looks like, all of the long list of columns > are nested under metrics, tags and flat_intervals. > > 0: jdbc:drill:zk=local> select * from ride_db_view limit 1; > > +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+ > | flat_intervals | metrics | tags | filename | date_col | fingerprint | > crc | metacrc | timestamp_col | dbversion | color | present | isRun | > isSwim | weight | samples | > > +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+ > | {"name":"Attivita' intera > ","start":"0","stop":"9920","startKM":"0","stopKM":"85.9888","type":"2","color":"#000080","seq":"0","METRICS":{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}} > | > {"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","l3_sustain":"1298.00000","l4_sustain":"440.00000","l5_sustain":"1782.00000","l6_sustain":"298.00000","l7_sustain":"123.00000","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"} > | {"Athlete":"prova ","Calendar Text":"Bike\n ","Data":"TDSPHC-A-L-E--- > ","Device":"SRM ","Device Info":" ","File Format":" > ","Filename":"2014_12_02_11_10_23.json ","Month":"dicembre ","Notes":" > ","Source Filename":"2014_12_02_11_10_23_2014_12_02_11_10_23.json > ","Sport":"Bike ","Weekday":"mar ","Year":"2014 "} | > 2014_12_02_11_10_23.json | 2014/12/02 10:10:23 UTC | 104091 | 0 | 65172 | > 1451321862 | 115 | #010101 | TDSPHC-A-L-E--- | 0 | 0 | 75 | 1 | > > +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+ > > > > On Thu, Jan 14, 2016 at 3:22 AM, Paolo Spanevello <[email protected]> > wrote: > >> Ciao to All, >> >> thanks for your answer, it works! >> >> Above the query and the result >> >> select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from >> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as >> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals, >> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from >> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as >> t) as tt where tt.flat_intervals.name = 'Attivita'' intera ' >> >> prova 2014/12/02 10:10:23 UTC >> 2.42418 >> prova 2014/12/03 08:31:21 UTC 1.95597 >> prova 2014/12/05 10:11:35 UTC 1.90420 >> prova 2014/12/17 08:27:33 UTC 1.72623 >> >> >> Now i'm writing all of you to understand if there is a way to drill down >> all the attributes, without specific each single attribute otherwise I have >> to write down 145 attributes into the query, and just to be more complicate >> to maintenance the drill, the drill is into the subquery, it means that I >> have to write down in the query twice the list of attribute, so 145*2. >> >> The query it could look like this: >> >> select tt.Athlete, tt.ride_date, >> tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*, >> tt.aerobic_decoupling from >> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as >> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals, >> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low, >> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as >> skiba_wprime_max,.....**145 >> attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as >> aerobic_decoupling from >> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as >> t) as tt where tt.flat_intervals.name = 'Attivita'' intera ' >> >> >> Thanks again for your support. >> >> Best regards, >> Paolo >> >> >> 2016-01-14 0:28 GMT+01:00 Jason Altekruse <[email protected]>: >> >>> The issue you have now is that you are passing an array into KVGEN, which >>> only works on maps. You can just flatten it again instead. >>> >>> Looking more closely at your dataset, I don't think you have any maps >>> that >>> would require KVGEN to analyze them, they all have keys that appear in >>> all >>> records, KVGEN is just for sparse maps. An example would be if you put >>> user >>> ID as the key, in which case you wouldn't want to create thousands of >>> different SQL statements (using each user_id as a column name) to analyze >>> different users. >>> >>> You should just use the dot notation to access nested members of one of >>> the >>> maps after you've flattened down to the correct level. >>> >>> Example (note the backticks around `start` as it is a SQL reserved word): >>> select t2.flat_intervals.name, t2.flat_intervals.`start`, >>> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as >>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select >>> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2; >>> >>> I also noticed that Metrics appears at two different levels, so here are >>> two examples for reaching both levels: >>> >>> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) >>> as >>> flat_rides from dfs.tmp.`rideDB.json`) as t >>> +------------+ >>> | EXPR$0 | >>> +------------+ >>> | 2.42418 | >>> | -13.54943 | >>> | 0.83858 | >>> | null | >>> +------------+ >>> >>> This query also shows how you can include a field next to the array when >>> flattening and bring it up the the outer query if you still need this >>> data >>> associated with the flattened data (note `date` is also a reserved word). >>> >>> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low, >>> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as >>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, >>> t.flat_rides.`date` >>> as ride_date from (select flatten(rides) as flat_rides from >>> dfs.tmp.`rideDB.json`) as t) as t2; >>> >>> +-------------------+--------------------------+ >>> | skiba_wprime_low | ride_date | >>> +-------------------+--------------------------+ >>> | 2.42418 | 2014/12/02 10:10:23 UTC | >>> | 1.95597 | 2014/12/02 10:10:23 UTC | >>> | 1.90420 | 2014/12/02 10:10:23 UTC | >>> | 1.72623 | 2014/12/02 10:10:23 UTC | >>> | 1.04275 | 2014/12/02 10:10:23 UTC | >>> | 1.55861 | 2014/12/02 10:10:23 UTC | >>> | 17.73633 | 2014/12/02 10:10:23 UTC | >>> | 17.31987 | 2014/12/02 10:10:23 UTC | >>> | 18.47823 | 2014/12/02 10:10:23 UTC | >>> ... data clipped >>> >>> >>> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <[email protected] >>> > >>> wrote: >>> >>> > Hi, >>> > >>> > thanks to all for your support. >>> > >>> > I'm trying as u suggest, to apply the kvgen function. Below you can >>> find >>> > the query and the error. >>> > >>> > *select tt.flat_intervals from (select >>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select >>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as >>> tt* >>> > >>> > I have this error message. >>> > >>> > Best regards, >>> > Paolo >>> > >>> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: >>> > select tt.flat_intervals from (select >>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select >>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as >>> tt >>> > [30027]Query execution error. Details:[ >>> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports >>> Simple >>> > maps as input >>> > >>> > Fragment 0:0 >>> > >>> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010] >>> > ] >>> > >>> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <[email protected]>: >>> > >>> >> Wasn't thinking about that, I was seeing the images because I was >>> included >>> >> directly in the recipients lists. >>> >> >>> >> +1 on text for searchability. >>> >> >>> >> Can you please rerun the queries and post the errors here as text? >>> >> >>> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <[email protected]> >>> >> wrote: >>> >> >>> >> > The apache mailing lists don't support image attachments, it would >>> be >>> >> > better to attach text anyway since it's searchable. >>> >> > >>> >> > Chris Matta >>> >> > [email protected] >>> >> > 215-701-3146 >>> >> > >>> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello < >>> [email protected] >>> >> > >>> >> > wrote: >>> >> > >>> >> >> Hi, >>> >> >> >>> >> >> thanks again Jason for your support. >>> >> >> >>> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can >>> >> find >>> >> >> the query and the error. >>> >> >> >>> >> >> *select tt.flat_intervals from (select >>> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from >>> (select >>> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) >>> as >>> >> tt* >>> >> >> >>> >> >> [image: Immagine incorporata 1] >>> >> >> >>> >> >> >>> >> >> Could u pls suggest how to fix this drill? >>> >> >> >>> >> >> Best regards, >>> >> >> Paolo >>> >> >> >>> >> >> >>> >> >> >>> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse < >>> [email protected]>: >>> >> >> >>> >> >>> This is a poor error messages that is produced when you try to >>> >> flatten a >>> >> >>> field that is not an array, for these fields you can just use the >>> dot >>> >> >>> notation to access their inner members (i.e. >>> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field >>> >> where the >>> >> >>> keys in a map are "unknown" or you want to do analysis on the >>> keys, >>> >> please >>> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that >>> reported >>> >> this >>> >> >>> issue a while ago to myself and will work to improve the message >>> [2]. >>> >> >>> >>> >> >>> >>> >> >>> [1] - https://drill.apache.org/docs/kvgen/ >>> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182 >>> >> >>> >>> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello < >>> >> [email protected]> >>> >> >>> wrote: >>> >> >>> >>> >> >>>> Hi All, >>> >> >>>> >>> >> >>>> Jason,I used your suggests and it works, thanks a lot! >>> >> >>>> >>> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS >>> as I >>> >> >>>> show below. >>> >> >>>> >>> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from >>> (select >>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t* >>> >> >>>> >>> >> >>>> [image: Immagine incorporata 1] >>> >> >>>> >>> >> >>>> The attribute "flat_intervals" is full of data that I would like >>> to >>> >> >>>> have separate them in several attributes as it is showed in the >>> link: >>> >> >>>> https://drill.apache.org/docs/flatten/ >>> >> >>>> I used a new subquery to have them: >>> >> >>>> >>> >> >>>> >>> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat >>> >> from >>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from >>> (select >>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as >>> t) as >>> >> tt* >>> >> >>> >> >>>> >>> >> >>>> but I have this error: >>> >> >>>> >>> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the >>> query: >>> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat >>> from >>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as >>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from >>> (select >>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as >>> t) as >>> >> tt >>> >> >>>> [30027]Query execution error. Details:[ >>> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast >>> >> >>>> org.apache.drill.exec.vector.complex.MapVector to >>> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector >>> >> >>>> >>> >> >>>> Fragment 0:0 >>> >> >>>> >>> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on >>> >> 192.168.1.101:31010] >>> >> >>>> ] >>> >> >>>> >>> >> >>>> [image: Immagine incorporata 2] >>> >> >>>> >>> >> >>>> Thanks in advance! >>> >> >>>> >>> >> >>>> Paolo >>> >> >>>> >>> >> >>>> >>> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse < >>> [email protected] >>> >> >: >>> >> >>>> >>> >> >>>>> Paolo, >>> >> >>>>> >>> >> >>>>> Drill currently reads single JSON objects as single records. If >>> you >>> >> >>>>> look at >>> >> >>>>> the top of your file you can see that the root of your document >>> is a >>> >> >>>>> single >>> >> >>>>> JSON object. >>> >> >>>>> >>> >> >>>>> Drill accepts two formats for individual records: >>> >> >>>>> >>> >> >>>>> The Mongo import format, a series of JSON object one after the >>> other >>> >> >>>>> in a >>> >> >>>>> file, whitespace is irrelevant, each one need not be followed >>> by a >>> >> >>>>> newline >>> >> >>>>> >>> >> >>>>> {"a" : 1, "b" : "hello" } >>> >> >>>>> {"a": 5 : "b" : "guten tag" } >>> >> >>>>> >>> >> >>>>> A JSON array of objects >>> >> >>>>> >>> >> >>>>> [ >>> >> >>>>> {"a" : 1, "b" : "hello" }, >>> >> >>>>> {"a" : 2, "b", "guten tag"} >>> >> >>>>> ] >>> >> >>>>> >>> >> >>>>> When you have a file like this you can read it using the FLATTEN >>> >> >>>>> functionality of Drill to turn an array into a series of >>> records. >>> >> >>>>> https://drill.apache.org/docs/flatten/ >>> >> >>>>> >>> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`; >>> >> >>>>> >>> >> >>>>> To work with the data further, you can put the flatten call in a >>> >> >>>>> subquery. >>> >> >>>>> Here is how you can select the first element from each records >>> list >>> >> of >>> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS >>> once >>> >> >>>>> the >>> >> >>>>> data has been flattened. >>> >> >>>>> To analyze the array, you could flatten again to get an exploded >>> >> >>>>> dataset >>> >> >>>>> with one record per interval across all records >>> >> >>>>> >>> >> >>>>> select t.flat_rides.INTERVALS[0], >>> >> t.flat_rides.METRICS.skiba_wprime_low >>> >> >>>>> from (select flatten(rides) as flat_rides from >>> >> dfs.tmp.`rideDB.json`) >>> >> >>>>> as t; >>> >> >>>>> >>> >> >>>>> Here you can see that individual columns can be selected next >>> to the >>> >> >>>>> flatten call, this will copy the data into each new record: >>> >> >>>>> >>> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals, >>> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select >>> flatten(rides) as >>> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t; >>> >> >>>>> >>> >> >>>>> Happy Drilling! >>> >> >>>>> >>> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello < >>> >> >>>>> [email protected]> >>> >> >>>>> wrote: >>> >> >>>>> >>> >> >>>>> > Hi all, >>> >> >>>>> > >>> >> >>>>> > i'm trying to query the file that you can find in attach with >>> >> drill >>> >> >>>>> apache >>> >> >>>>> > 1.4 . The result of this qurey is always 1 record. >>> >> >>>>> > >>> >> >>>>> > The query that i'm running is : >>> >> >>>>> > >>> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t >>> >> >>>>> > >>> >> >>>>> > If i run the similar query with the file donuts.json found on >>> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query >>> runs >>> >> >>>>> properly. >>> >> >>>>> > >>> >> >>>>> > SELECT t.topping FROM donuts.json t >>> >> >>>>> > >>> >> >>>>> > Thanks in advance. >>> >> >>>>> > >>> >> >>>>> > Paolo >>> >> >>>>> > >>> >> >>>>> >>> >> >>>> >>> >> >>>> >>> >> >>> >>> >> >> >>> >> > >>> >> >>> > >>> > >>> >> >> >
