In addition to what Hao suggests, I think it's worth noting that the insert
query plan created by Impala changed a bit over time.

It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a
very straightforward insert plan - each node separately inserted rows in
whatever order the rows were consumed. This plan worked well for smaller
inserts but could cause timeouts with larger workloads.

In Impala 2.9, the plan was changed so that Impala performs some shuffling
and sorting before inserting into Kudu. This makes the Kudu insert pattern
more reliable and efficient, but could cause a degradation for some
workloads since Impala's sorts are single-threaded.

Impala 2.10 (which I guess you are running) improved a bit over 2.9 in
ensuring that the sorts can be "partial" which resolved some of the
performance degradation, but it's possible your workload is still affected
negatively.

To disable the new behavior you can use the insert hints 'noshuffle' and/or
'noclustered', such as:

upsert into my_table /* +noclustered,noshuffle */ select * from
my_other_table;


Hope that helps
-Todd

On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <hao....@cloudera.com> wrote:

> Did you happen to check the health of the cluster after the upgrade by 'kudu
> cluster ksck'?
>
> Best,
> Hao
>
> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Hello,
>>
>> we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1
>> and noticed quite severe performance degradation. We did CTAS from Impala
>> parquet table which has not changed a bit since the upgrade (even the same
>> # of rows) to Kudu using the follow query below.
>>
>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>>
>> Of course Impala version was also bumped with CDH 5.13.
>>
>> Any clue why it takes so much time now?
>>
>> Table has 5.5B rows..
>>
>> create TABLE kudutest_ts.clinical_event_nots
>>
>> PRIMARY KEY (clinical_event_id)
>>
>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>>
>> STORED AS KUDU
>>
>> AS
>>
>> SELECT
>>
>>   clinical_event_id,
>>
>>   encntr_id,
>>
>>   person_id,
>>
>>   encntr_financial_id,
>>
>>   event_id,
>>
>>   event_title_text,
>>
>>   CAST(view_level as string) as view_level,
>>
>>   order_id,
>>
>>   catalog_cd,
>>
>>   series_ref_nbr,
>>
>>   accession_nbr,
>>
>>   contributor_system_cd,
>>
>>   reference_nbr,
>>
>>   parent_event_id,
>>
>>   event_reltn_cd,
>>
>>   event_class_cd,
>>
>>   event_cd,
>>
>>   event_tag,
>>
>>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>>
>>   result_val,
>>
>>   result_units_cd,
>>
>>   result_time_units_cd,
>>
>>   task_assay_cd,
>>
>>   record_status_cd,
>>
>>   result_status_cd,
>>
>>   CAST(authentic_flag as STRING) authentic_flag,
>>
>>   CAST(publish_flag as STRING) publish_flag,
>>
>>   qc_review_cd,
>>
>>   normalcy_cd,
>>
>>   normalcy_method_cd,
>>
>>   inquire_security_cd,
>>
>>   resource_group_cd,
>>
>>   resource_cd,
>>
>>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>>
>>   collating_seq,
>>
>>   verified_prsnl_id,
>>
>>   performed_prsnl_id,
>>
>>   updt_id,
>>
>>   CAST(updt_task as STRING) updt_task,
>>
>>   updt_cnt,
>>
>>   CAST(updt_applctx as STRING) updt_applctx,
>>
>>   normal_low,
>>
>>   normal_high,
>>
>>   critical_low,
>>
>>   critical_high,
>>
>>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>>
>>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>>
>>   CAST(order_action_sequence as STRING) order_action_sequence,
>>
>>   entry_mode_cd,
>>
>>   source_cd,
>>
>>   clinical_seq,
>>
>>   CAST(event_end_tz as STRING) event_end_tz,
>>
>>   CAST(event_start_tz as STRING) event_start_tz,
>>
>>   CAST(performed_tz as STRING) performed_tz,
>>
>>   CAST(verified_tz as STRING) verified_tz,
>>
>>   task_assay_version_nbr,
>>
>>   modifier_long_text_id,
>>
>>   ce_dynamic_label_id,
>>
>>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>>
>>   src_event_id,
>>
>>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>>
>>   device_free_txt,
>>
>>   CAST(trait_bit_map as STRING) trait_bit_map,
>>
>>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>>
>>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>>
>>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>>
>>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>>
>>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>>
>>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>>
>>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>>
>>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>>
>>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>>
>>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>>
>>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>>
>>   txn_id_text,
>>
>>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>>
>> FROM v500.clinical_event
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to