Re: JSON File, Total numbers Record: 1

Jason Altekruse Thu, 14 Jan 2016 08:37:56 -0800

You can use * to propagate all traits up. The only problem with this is
that it will pull up a copy of the list you are flattening as well (copying
the complete list next to each record in the flattened out version of the
list). Databases have always frustratingly lacked a feature of 'select *
(excluding the large field I don't need to read, that just strains the
database because someone thought we should put binaries in the table)'.
Considering the liberal types of data Drill can query, it probably makes
sense to introduce such a concept.


I'm going to create a few views to make this easier to process. Note to
create view you need to be in a writable workspace. By default the dfs.tmp
workspace is the only writable one, see this page for configuring a more
permanent workspace to store your views in
https://drill.apache.org/docs/workspaces/

Here is a simple view that can copy up all of the fields, although it will
contain extra copies of the first list level that was flattened.

create or replace view ride_db_view as select
flatten(t.flat_rides.INTERVALS) as flat_intervals, t.* from (select
version,flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;

Looking at the data is kind of hard with the size of the records, but I
tried to write a few queries to illustrate this point.

Here is a simple select out of the view above, selecting out of the list of
flattened records in the outermost query, it returns 80 records:
select t.flat_intervals as first_int from ride_db_view as t;

You can also grab the first element out of the un_flattened version of the
same list out of the view, this however is confusing because it still
returns 80 records, because flat_rides now has been copied next to each
flattened interval, so a bunch of these are repeats:
select t.flat_rides.intervals[0] as first_int from ride_db_view as t;

Here you can filter the results to find where the first element in the
inner list matches outermost flattened one, this gets you back to the size
of the root level list of rides, Drill currently doesn't implement map
equality so I used the combination of the start and stop time to compare
the maps:
select * from (select t.flat_rides.intervals[0] as first_int,
t.flat_intervals from ride_db_view as t) as t2 where t2.first_int.`start` =
t2.flat_intervals.`start` AND t2.first_int.stop = t2.flat_intervals.stop;

Thankfully for you, your dataset is well structured to avoid most of this
headache, as a bunch of your fields are nested into maps, and Drill allows
you to select the whole map by name to move it up to the next outer query.

Here is a little bit longer query that selects all of the data, without
keeping around extra copies of the lists, you only have to enumerate the
few fields at the top of each "RIDE" record.
create or replace view ride_db_view as select
flatten(t.flat_rides.INTERVALS) as flat_intervals, t.flat_rides.METRICS,
t.flat_rides.TAGS,t.flat_rides.`filename`, t.flat_rides.`date`,
t.flat_rides.`fingerprint`, t.flat_rides.`crc`, t.flat_rides.`metacrc`,
t.flat_rides.           `timestamp`, t.flat_rides.`dbversion`,
t.flat_rides.`color`,t.flat_rides.`present`,t.flat_rides.`isRun`,t.flat_rides.`isSwim`,t.flat_rides.`weight`,t.flat_rides.`samples`
from (select version,flatten(rides) as flat_rides from
dfs.tmp.`rideDB.json`) as t;

Here you can see that flat_rides is no longer available:
>> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
Jan 14, 2016 8:22:42 AM
org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column
'flat_rides' not found in table 't'

Here is what the output data looks like, all of the long list of columns
are nested under metrics, tags and flat_intervals.

0: jdbc:drill:zk=local> select * from ride_db_view limit 1;
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
| flat_intervals | metrics | tags | filename | date_col | fingerprint | crc
| metacrc | timestamp_col | dbversion | color | present | isRun | isSwim |
weight | samples |
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
| {"name":"Attivita' intera
","start":"0","stop":"9920","startKM":"0","stopKM":"85.9888","type":"2","color":"#000080","seq":"0","METRICS":{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}}
|
{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","l3_sustain":"1298.00000","l4_sustain":"440.00000","l5_sustain":"1782.00000","l6_sustain":"298.00000","l7_sustain":"123.00000","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}
| {"Athlete":"prova ","Calendar Text":"Bike\n ","Data":"TDSPHC-A-L-E---
","Device":"SRM ","Device Info":" ","File Format":"
","Filename":"2014_12_02_11_10_23.json ","Month":"dicembre ","Notes":"
","Source Filename":"2014_12_02_11_10_23_2014_12_02_11_10_23.json
","Sport":"Bike ","Weekday":"mar ","Year":"2014 "} |
2014_12_02_11_10_23.json | 2014/12/02 10:10:23 UTC | 104091 | 0 | 65172 |
1451321862 | 115 | #010101 | TDSPHC-A-L-E--- | 0 | 0 | 75 | 1 |
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+



On Thu, Jan 14, 2016 at 3:22 AM, Paolo Spanevello <paolosp...@gmail.com>
wrote:

> Ciao to All,
>
> thanks for your answer, it works!
>
> Above the query and the result
>
> select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from
> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
> as tt where tt.flat_intervals.name = 'Attivita'' intera '
>
>  prova 2014/12/02 10:10:23 UTC
> 2.42418 prova 2014/12/03 08:31:21 UTC1.95597 prova 2014/12/05 10:11:35 UTC
> 1.90420 prova 2014/12/17 08:27:33 UTC1.72623
>
>
> Now i'm writing all of you to understand if there is a way to drill down
> all the attributes, without specific each single attribute otherwise I have
> to write down 145 attributes into the query, and just to be more complicate
> to maintenance the drill, the drill is into the subquery, it means that I
> have to write down in the query twice the list of attribute, so 145*2.
>
> The query it could look like this:
>
> select tt.Athlete, tt.ride_date,
> tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*,
> tt.aerobic_decoupling from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as skiba_wprime_max,.....**145
> attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as
> aerobic_decoupling from
> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
> as tt where tt.flat_intervals.name = 'Attivita'' intera '
>
>
> Thanks again for your support.
>
> Best regards,
> Paolo
>
>
> 2016-01-14 0:28 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>:
>
>> The issue you have now is that you are passing an array into KVGEN, which
>> only works on maps. You can just flatten it again instead.
>>
>> Looking more closely at your dataset, I don't think you have any maps that
>> would require KVGEN to analyze them, they all have keys that appear in all
>> records, KVGEN is just for sparse maps. An example would be if you put
>> user
>> ID as the key, in which case you wouldn't want to create thousands of
>> different SQL statements (using each user_id as a column name) to analyze
>> different users.
>>
>> You should just use the dot notation to access nested members of one of
>> the
>> maps after you've flattened down to the correct level.
>>
>> Example (note the backticks around `start` as it is a SQL reserved word):
>> select t2.flat_intervals.name, t2.flat_intervals.`start`,
>> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
>> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;
>>
>> I also noticed that Metrics appears at two different levels, so here are
>> two examples for reaching both levels:
>>
>> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides)
>> as
>> flat_rides from dfs.tmp.`rideDB.json`) as t
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | 2.42418    |
>> | -13.54943  |
>> | 0.83858    |
>> | null       |
>> +------------+
>>
>> This query also shows how you can include a field next to the array when
>> flattening and bring it up the the outer query if you still need this data
>> associated with the flattened data (note `date` is also a reserved word).
>>
>> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
>> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, t.flat_rides.`date`
>> as ride_date from (select flatten(rides) as flat_rides from
>> dfs.tmp.`rideDB.json`) as t) as t2;
>>
>> +-------------------+--------------------------+
>> | skiba_wprime_low  |        ride_date         |
>> +-------------------+--------------------------+
>> | 2.42418           | 2014/12/02 10:10:23 UTC  |
>> | 1.95597           | 2014/12/02 10:10:23 UTC  |
>> | 1.90420           | 2014/12/02 10:10:23 UTC  |
>> | 1.72623           | 2014/12/02 10:10:23 UTC  |
>> | 1.04275           | 2014/12/02 10:10:23 UTC  |
>> | 1.55861           | 2014/12/02 10:10:23 UTC  |
>> | 17.73633          | 2014/12/02 10:10:23 UTC  |
>> | 17.31987          | 2014/12/02 10:10:23 UTC  |
>> | 18.47823          | 2014/12/02 10:10:23 UTC  |
>> ... data clipped
>>
>>
>> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <paolosp...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > thanks to all for your support.
>> >
>> > I'm trying as u suggest, to apply the kvgen function. Below you can find
>> > the query and the error.
>> >
>> > *select tt.flat_intervals from (select
>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>> tt*
>> >
>> > I have this error message.
>> >
>> > Best regards,
>> > Paolo
>> >
>> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> > select tt.flat_intervals from (select
>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
>> > [30027]Query execution error. Details:[
>> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
>> > maps as input
>> >
>> > Fragment 0:0
>> >
>> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
>> > ]
>> >
>> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <altekruseja...@gmail.com>:
>> >
>> >> Wasn't thinking about that, I was seeing the images because I was
>> included
>> >> directly in the recipients lists.
>> >>
>> >> +1 on text for searchability.
>> >>
>> >> Can you please rerun the queries and post the errors here as text?
>> >>
>> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cma...@mapr.com>
>> >> wrote:
>> >>
>> >> > The apache mailing lists don't support image attachments, it would be
>> >> > better to attach text anyway since it's searchable.
>> >> >
>> >> > Chris Matta
>> >> > cma...@mapr.com
>> >> > 215-701-3146
>> >> >
>> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <
>> paolosp...@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> thanks again Jason for your support.
>> >> >>
>> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>> >> find
>> >> >> the query and the error.
>> >> >>
>> >> >> *select tt.flat_intervals from (select
>> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from
>> (select
>> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t )
>> as
>> >> tt*
>> >> >>
>> >> >> [image: Immagine incorporata 1]
>> >> >>
>> >> >>
>> >> >> Could u pls suggest how to fix this drill?
>> >> >>
>> >> >> Best regards,
>> >> >> Paolo
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <
>> altekruseja...@gmail.com>:
>> >> >>
>> >> >>> This is a poor error messages that is produced when you try to
>> >> flatten a
>> >> >>> field that is not an array, for these fields you can just use the
>> dot
>> >> >>> notation to access their inner members (i.e.
>> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>> >> where the
>> >> >>> keys in a map are "unknown" or you want to do analysis on the keys,
>> >> please
>> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
>> >> this
>> >> >>> issue a while ago to myself and will work to improve the message
>> [2].
>> >> >>>
>> >> >>>
>> >> >>> [1] - https://drill.apache.org/docs/kvgen/
>> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>> >> >>>
>> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>> >> paolosp...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Hi  All,
>> >> >>>>
>> >> >>>> Jason,I used your suggests and it works, thanks a lot!
>> >> >>>>
>> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS as
>> I
>> >> >>>> show below.
>> >> >>>>
>> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>> >> >>>>
>> >> >>>> [image: Immagine incorporata 1]
>> >> >>>>
>> >> >>>> The attribute "flat_intervals" is full of data that I would like
>> to
>> >> >>>> have separate them in several attributes as it is showed in the
>> link:
>> >> >>>> https://drill.apache.org/docs/flatten/
>> >> >>>> I used a new subquery to have them:
>> >> >>>>
>> >> >>>>
>> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> >> from
>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
>> as
>> >> tt*
>> >>
>> >> >>>>
>> >> >>>> but I have this error:
>> >> >>>>
>> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the
>> query:
>> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> from
>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
>> as
>> >> tt
>> >> >>>> [30027]Query execution error. Details:[
>> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>> >> >>>> org.apache.drill.exec.vector.complex.MapVector to
>> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> >> >>>>
>> >> >>>> Fragment 0:0
>> >> >>>>
>> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>> >> 192.168.1.101:31010]
>> >> >>>> ]
>> >> >>>>
>> >> >>>> [image: Immagine incorporata 2]
>> >> >>>>
>> >> >>>> Thanks in advance!
>> >> >>>>
>> >> >>>> Paolo
>> >> >>>>
>> >> >>>>
>> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <
>> altekruseja...@gmail.com
>> >> >:
>> >> >>>>
>> >> >>>>> Paolo,
>> >> >>>>>
>> >> >>>>> Drill currently reads single JSON objects as single records. If
>> you
>> >> >>>>> look at
>> >> >>>>> the top of your file you can see that the root of your document
>> is a
>> >> >>>>> single
>> >> >>>>> JSON object.
>> >> >>>>>
>> >> >>>>> Drill accepts two formats for individual records:
>> >> >>>>>
>> >> >>>>> The Mongo import format, a series of JSON object one after the
>> other
>> >> >>>>> in a
>> >> >>>>> file, whitespace is irrelevant, each one need not be followed by
>> a
>> >> >>>>> newline
>> >> >>>>>
>> >> >>>>> {"a" : 1, "b" : "hello" }
>> >> >>>>> {"a": 5 : "b" : "guten tag" }
>> >> >>>>>
>> >> >>>>> A JSON array of objects
>> >> >>>>>
>> >> >>>>> [
>> >> >>>>>     {"a" : 1, "b" : "hello" },
>> >> >>>>>     {"a" : 2, "b", "guten tag"}
>> >> >>>>> ]
>> >> >>>>>
>> >> >>>>> When you have a file like this you can read it using the FLATTEN
>> >> >>>>> functionality of Drill to turn an array into a series of records.
>> >> >>>>> https://drill.apache.org/docs/flatten/
>> >> >>>>>
>> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>> >> >>>>>
>> >> >>>>> To work with the data further, you can put the flatten call in a
>> >> >>>>> subquery.
>> >> >>>>> Here is how you can select the first element from each records
>> list
>> >> of
>> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS
>> once
>> >> >>>>> the
>> >> >>>>> data has been flattened.
>> >> >>>>> To analyze the array, you could flatten again to get an exploded
>> >> >>>>> dataset
>> >> >>>>> with one record per interval across all records
>> >> >>>>>
>> >> >>>>> select t.flat_rides.INTERVALS[0],
>> >> t.flat_rides.METRICS.skiba_wprime_low
>> >> >>>>> from (select flatten(rides) as flat_rides from
>> >> dfs.tmp.`rideDB.json`)
>> >> >>>>> as t;
>> >> >>>>>
>> >> >>>>> Here you can see that individual columns can be selected next to
>> the
>> >> >>>>> flatten call, this will copy the data into each new record:
>> >> >>>>>
>> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select
>> flatten(rides) as
>> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>> >> >>>>>
>> >> >>>>> Happy Drilling!
>> >> >>>>>
>> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>> >> >>>>> paolosp...@gmail.com>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>> > Hi all,
>> >> >>>>> >
>> >> >>>>> > i'm trying to query the file that you can find in attach with
>> >> drill
>> >> >>>>> apache
>> >> >>>>> > 1.4 . The result of this qurey is always 1 record.
>> >> >>>>> >
>> >> >>>>> > The query that i'm running is :
>> >> >>>>> >
>> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >> >>>>> >
>> >> >>>>> > If i run the similar query with the file donuts.json found on
>> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query
>> runs
>> >> >>>>> properly.
>> >> >>>>> >
>> >> >>>>> > SELECT t.topping FROM donuts.json t
>> >> >>>>> >
>> >> >>>>> > Thanks in advance.
>> >> >>>>> >
>> >> >>>>> > Paolo
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Reply via email to