Re: JSON File, Total numbers Record: 1

Paolo Spanevello Wed, 20 Jan 2016 01:28:56 -0800

Hi Jason,

You are right,  it probably makes sense to introduce such a concept. We
could be free from the DataSet.


Could I support or apply some requirement on the web?

Best,
Paolo

2016-01-14 17:36 GMT+01:00 Jason Altekruse <[email protected]>:

> You can use * to propagate all traits up. The only problem with this is
> that it will pull up a copy of the list you are flattening as well (copying
> the complete list next to each record in the flattened out version of the
> list). Databases have always frustratingly lacked a feature of 'select *
> (excluding the large field I don't need to read, that just strains the
> database because someone thought we should put binaries in the table)'.
> Considering the liberal types of data Drill can query, it probably makes
> sense to introduce such a concept.
>
> I'm going to create a few views to make this easier to process. Note to
> create view you need to be in a writable workspace. By default the dfs.tmp
> workspace is the only writable one, see this page for configuring a more
> permanent workspace to store your views in
> https://drill.apache.org/docs/workspaces/
>
> Here is a simple view that can copy up all of the fields, although it will
> contain extra copies of the first list level that was flattened.
>
> create or replace view ride_db_view as select
> flatten(t.flat_rides.INTERVALS) as flat_intervals, t.* from (select
> version,flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;
>
> Looking at the data is kind of hard with the size of the records, but I
> tried to write a few queries to illustrate this point.
>
> Here is a simple select out of the view above, selecting out of the list
> of flattened records in the outermost query, it returns 80 records:
> select t.flat_intervals as first_int from ride_db_view as t;
>
> You can also grab the first element out of the un_flattened version of the
> same list out of the view, this however is confusing because it still
> returns 80 records, because flat_rides now has been copied next to each
> flattened interval, so a bunch of these are repeats:
> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
>
> Here you can filter the results to find where the first element in the
> inner list matches outermost flattened one, this gets you back to the size
> of the root level list of rides, Drill currently doesn't implement map
> equality so I used the combination of the start and stop time to compare
> the maps:
> select * from (select t.flat_rides.intervals[0] as first_int,
> t.flat_intervals from ride_db_view as t) as t2 where t2.first_int.`start` =
> t2.flat_intervals.`start` AND t2.first_int.stop = t2.flat_intervals.stop;
>
> Thankfully for you, your dataset is well structured to avoid most of this
> headache, as a bunch of your fields are nested into maps, and Drill allows
> you to select the whole map by name to move it up to the next outer query.
>
> Here is a little bit longer query that selects all of the data, without
> keeping around extra copies of the lists, you only have to enumerate the
> few fields at the top of each "RIDE" record.
> create or replace view ride_db_view as select
> flatten(t.flat_rides.INTERVALS) as flat_intervals, t.flat_rides.METRICS,
> t.flat_rides.TAGS,t.flat_rides.`filename`, t.flat_rides.`date`,
> t.flat_rides.`fingerprint`, t.flat_rides.`crc`, t.flat_rides.`metacrc`,
> t.flat_rides.           `timestamp`, t.flat_rides.`dbversion`,
> t.flat_rides.`color`,t.flat_rides.`present`,t.flat_rides.`isRun`,t.flat_rides.`isSwim`,t.flat_rides.`weight`,t.flat_rides.`samples`
> from (select version,flatten(rides) as flat_rides from
> dfs.tmp.`rideDB.json`) as t;
>
> Here you can see that flat_rides is no longer available:
> >> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
> Jan 14, 2016 8:22:42 AM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column
> 'flat_rides' not found in table 't'
>
> Here is what the output data looks like, all of the long list of columns
> are nested under metrics, tags and flat_intervals.
>
> 0: jdbc:drill:zk=local> select * from ride_db_view limit 1;
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
> | flat_intervals | metrics | tags | filename | date_col | fingerprint |
> crc | metacrc | timestamp_col | dbversion | color | present | isRun |
> isSwim | weight | samples |
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
> | {"name":"Attivita' intera
> ","start":"0","stop":"9920","startKM":"0","stopKM":"85.9888","type":"2","color":"#000080","seq":"0","METRICS":{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}}
> |
> {"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","l3_sustain":"1298.00000","l4_sustain":"440.00000","l5_sustain":"1782.00000","l6_sustain":"298.00000","l7_sustain":"123.00000","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}
> | {"Athlete":"prova ","Calendar Text":"Bike\n ","Data":"TDSPHC-A-L-E---
> ","Device":"SRM ","Device Info":" ","File Format":"
> ","Filename":"2014_12_02_11_10_23.json ","Month":"dicembre ","Notes":"
> ","Source Filename":"2014_12_02_11_10_23_2014_12_02_11_10_23.json
> ","Sport":"Bike ","Weekday":"mar ","Year":"2014 "} |
> 2014_12_02_11_10_23.json | 2014/12/02 10:10:23 UTC | 104091 | 0 | 65172 |
> 1451321862 | 115 | #010101 | TDSPHC-A-L-E--- | 0 | 0 | 75 | 1 |
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
>
>
>
> On Thu, Jan 14, 2016 at 3:22 AM, Paolo Spanevello <[email protected]>
> wrote:
>
>> Ciao to All,
>>
>> thanks for your answer, it works!
>>
>> Above the query and the result
>>
>> select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
>> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from
>> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as
>> t) as tt where tt.flat_intervals.name = 'Attivita'' intera '
>>
>>   prova 2014/12/02 10:10:23 UTC
>> 2.42418
>>   prova 2014/12/03 08:31:21 UTC 1.95597
>>   prova 2014/12/05 10:11:35 UTC 1.90420
>>   prova 2014/12/17 08:27:33 UTC 1.72623
>>
>>
>> Now i'm writing all of you to understand if there is a way to drill down
>> all the attributes, without specific each single attribute otherwise I have
>> to write down 145 attributes into the query, and just to be more complicate
>> to maintenance the drill, the drill is into the subquery, it means that I
>> have to write down in the query twice the list of attribute, so 145*2.
>>
>> The query it could look like this:
>>
>> select tt.Athlete, tt.ride_date,
>> tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*,
>> tt.aerobic_decoupling from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
>> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as 
>> skiba_wprime_max,.....**145
>> attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as
>> aerobic_decoupling from
>> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as
>> t) as tt where tt.flat_intervals.name = 'Attivita'' intera '
>>
>>
>> Thanks again for your support.
>>
>> Best regards,
>> Paolo
>>
>>
>> 2016-01-14 0:28 GMT+01:00 Jason Altekruse <[email protected]>:
>>
>>> The issue you have now is that you are passing an array into KVGEN, which
>>> only works on maps. You can just flatten it again instead.
>>>
>>> Looking more closely at your dataset, I don't think you have any maps
>>> that
>>> would require KVGEN to analyze them, they all have keys that appear in
>>> all
>>> records, KVGEN is just for sparse maps. An example would be if you put
>>> user
>>> ID as the key, in which case you wouldn't want to create thousands of
>>> different SQL statements (using each user_id as a column name) to analyze
>>> different users.
>>>
>>> You should just use the dot notation to access nested members of one of
>>> the
>>> maps after you've flattened down to the correct level.
>>>
>>> Example (note the backticks around `start` as it is a SQL reserved word):
>>> select t2.flat_intervals.name, t2.flat_intervals.`start`,
>>> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
>>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
>>> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;
>>>
>>> I also noticed that Metrics appears at two different levels, so here are
>>> two examples for reaching both levels:
>>>
>>> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides)
>>> as
>>> flat_rides from dfs.tmp.`rideDB.json`) as t
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | 2.42418    |
>>> | -13.54943  |
>>> | 0.83858    |
>>> | null       |
>>> +------------+
>>>
>>> This query also shows how you can include a field next to the array when
>>> flattening and bring it up the the outer query if you still need this
>>> data
>>> associated with the flattened data (note `date` is also a reserved word).
>>>
>>> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
>>> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
>>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low,
>>> t.flat_rides.`date`
>>> as ride_date from (select flatten(rides) as flat_rides from
>>> dfs.tmp.`rideDB.json`) as t) as t2;
>>>
>>> +-------------------+--------------------------+
>>> | skiba_wprime_low  |        ride_date         |
>>> +-------------------+--------------------------+
>>> | 2.42418           | 2014/12/02 10:10:23 UTC  |
>>> | 1.95597           | 2014/12/02 10:10:23 UTC  |
>>> | 1.90420           | 2014/12/02 10:10:23 UTC  |
>>> | 1.72623           | 2014/12/02 10:10:23 UTC  |
>>> | 1.04275           | 2014/12/02 10:10:23 UTC  |
>>> | 1.55861           | 2014/12/02 10:10:23 UTC  |
>>> | 17.73633          | 2014/12/02 10:10:23 UTC  |
>>> | 17.31987          | 2014/12/02 10:10:23 UTC  |
>>> | 18.47823          | 2014/12/02 10:10:23 UTC  |
>>> ... data clipped
>>>
>>>
>>> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <[email protected]
>>> >
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > thanks to all for your support.
>>> >
>>> > I'm trying as u suggest, to apply the kvgen function. Below you can
>>> find
>>> > the query and the error.
>>> >
>>> > *select tt.flat_intervals from (select
>>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>>> tt*
>>> >
>>> > I have this error message.
>>> >
>>> > Best regards,
>>> > Paolo
>>> >
>>> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>>> > select tt.flat_intervals from (select
>>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>>> tt
>>> > [30027]Query execution error. Details:[
>>> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports
>>> Simple
>>> > maps as input
>>> >
>>> > Fragment 0:0
>>> >
>>> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
>>> > ]
>>> >
>>> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <[email protected]>:
>>> >
>>> >> Wasn't thinking about that, I was seeing the images because I was
>>> included
>>> >> directly in the recipients lists.
>>> >>
>>> >> +1 on text for searchability.
>>> >>
>>> >> Can you please rerun the queries and post the errors here as text?
>>> >>
>>> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <[email protected]>
>>> >> wrote:
>>> >>
>>> >> > The apache mailing lists don't support image attachments, it would
>>> be
>>> >> > better to attach text anyway since it's searchable.
>>> >> >
>>> >> > Chris Matta
>>> >> > [email protected]
>>> >> > 215-701-3146
>>> >> >
>>> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <
>>> [email protected]
>>> >> >
>>> >> > wrote:
>>> >> >
>>> >> >> Hi,
>>> >> >>
>>> >> >> thanks again Jason for your support.
>>> >> >>
>>> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>>> >> find
>>> >> >> the query and the error.
>>> >> >>
>>> >> >> *select tt.flat_intervals from (select
>>> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from
>>> (select
>>> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t )
>>> as
>>> >> tt*
>>> >> >>
>>> >> >> [image: Immagine incorporata 1]
>>> >> >>
>>> >> >>
>>> >> >> Could u pls suggest how to fix this drill?
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Paolo
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <
>>> [email protected]>:
>>> >> >>
>>> >> >>> This is a poor error messages that is produced when you try to
>>> >> flatten a
>>> >> >>> field that is not an array, for these fields you can just use the
>>> dot
>>> >> >>> notation to access their inner members (i.e.
>>> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>>> >> where the
>>> >> >>> keys in a map are "unknown" or you want to do analysis on the
>>> keys,
>>> >> please
>>> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that
>>> reported
>>> >> this
>>> >> >>> issue a while ago to myself and will work to improve the message
>>> [2].
>>> >> >>>
>>> >> >>>
>>> >> >>> [1] - https://drill.apache.org/docs/kvgen/
>>> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>>> >> >>>
>>> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>>> >> [email protected]>
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>> Hi  All,
>>> >> >>>>
>>> >> >>>> Jason,I used your suggests and it works, thanks a lot!
>>> >> >>>>
>>> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS
>>> as I
>>> >> >>>> show below.
>>> >> >>>>
>>> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>> >> >>>>
>>> >> >>>> [image: Immagine incorporata 1]
>>> >> >>>>
>>> >> >>>> The attribute "flat_intervals" is full of data that I would like
>>> to
>>> >> >>>> have separate them in several attributes as it is showed in the
>>> link:
>>> >> >>>> https://drill.apache.org/docs/flatten/
>>> >> >>>> I used a new subquery to have them:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>>> >> from
>>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as
>>> t) as
>>> >> tt*
>>> >>
>>> >> >>>>
>>> >> >>>> but I have this error:
>>> >> >>>>
>>> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the
>>> query:
>>> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>>> from
>>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as
>>> t) as
>>> >> tt
>>> >> >>>> [30027]Query execution error. Details:[
>>> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>>> >> >>>> org.apache.drill.exec.vector.complex.MapVector to
>>> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>> >> >>>>
>>> >> >>>> Fragment 0:0
>>> >> >>>>
>>> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>>> >> 192.168.1.101:31010]
>>> >> >>>> ]
>>> >> >>>>
>>> >> >>>> [image: Immagine incorporata 2]
>>> >> >>>>
>>> >> >>>> Thanks in advance!
>>> >> >>>>
>>> >> >>>> Paolo
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <
>>> [email protected]
>>> >> >:
>>> >> >>>>
>>> >> >>>>> Paolo,
>>> >> >>>>>
>>> >> >>>>> Drill currently reads single JSON objects as single records. If
>>> you
>>> >> >>>>> look at
>>> >> >>>>> the top of your file you can see that the root of your document
>>> is a
>>> >> >>>>> single
>>> >> >>>>> JSON object.
>>> >> >>>>>
>>> >> >>>>> Drill accepts two formats for individual records:
>>> >> >>>>>
>>> >> >>>>> The Mongo import format, a series of JSON object one after the
>>> other
>>> >> >>>>> in a
>>> >> >>>>> file, whitespace is irrelevant, each one need not be followed
>>> by a
>>> >> >>>>> newline
>>> >> >>>>>
>>> >> >>>>> {"a" : 1, "b" : "hello" }
>>> >> >>>>> {"a": 5 : "b" : "guten tag" }
>>> >> >>>>>
>>> >> >>>>> A JSON array of objects
>>> >> >>>>>
>>> >> >>>>> [
>>> >> >>>>>     {"a" : 1, "b" : "hello" },
>>> >> >>>>>     {"a" : 2, "b", "guten tag"}
>>> >> >>>>> ]
>>> >> >>>>>
>>> >> >>>>> When you have a file like this you can read it using the FLATTEN
>>> >> >>>>> functionality of Drill to turn an array into a series of
>>> records.
>>> >> >>>>> https://drill.apache.org/docs/flatten/
>>> >> >>>>>
>>> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>> >> >>>>>
>>> >> >>>>> To work with the data further, you can put the flatten call in a
>>> >> >>>>> subquery.
>>> >> >>>>> Here is how you can select the first element from each records
>>> list
>>> >> of
>>> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS
>>> once
>>> >> >>>>> the
>>> >> >>>>> data has been flattened.
>>> >> >>>>> To analyze the array, you could flatten again to get an exploded
>>> >> >>>>> dataset
>>> >> >>>>> with one record per interval across all records
>>> >> >>>>>
>>> >> >>>>> select t.flat_rides.INTERVALS[0],
>>> >> t.flat_rides.METRICS.skiba_wprime_low
>>> >> >>>>> from (select flatten(rides) as flat_rides from
>>> >> dfs.tmp.`rideDB.json`)
>>> >> >>>>> as t;
>>> >> >>>>>
>>> >> >>>>> Here you can see that individual columns can be selected next
>>> to the
>>> >> >>>>> flatten call, this will copy the data into each new record:
>>> >> >>>>>
>>> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select
>>> flatten(rides) as
>>> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>> >> >>>>>
>>> >> >>>>> Happy Drilling!
>>> >> >>>>>
>>> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>>> >> >>>>> [email protected]>
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>> > Hi all,
>>> >> >>>>> >
>>> >> >>>>> > i'm trying to query the file that you can find in attach with
>>> >> drill
>>> >> >>>>> apache
>>> >> >>>>> > 1.4 . The result of this qurey is always 1 record.
>>> >> >>>>> >
>>> >> >>>>> > The query that i'm running is :
>>> >> >>>>> >
>>> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>> >> >>>>> >
>>> >> >>>>> > If i run the similar query with the file donuts.json found on
>>> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query
>>> runs
>>> >> >>>>> properly.
>>> >> >>>>> >
>>> >> >>>>> > SELECT t.topping FROM donuts.json t
>>> >> >>>>> >
>>> >> >>>>> > Thanks in advance.
>>> >> >>>>> >
>>> >> >>>>> > Paolo
>>> >> >>>>> >
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: JSON File, Total numbers Record: 1

Reply via email to