point taken! thanks Todd! you saved us from days of frustration, so really thankful for that!
Boris On Wed, Feb 28, 2018 at 2:07 PM, Todd Lipcon <[email protected]> wrote: > Hi Boris, > > Thanks for the feedback and sharing your experience. Like you said, this > is more of an issue with downstream documentation so it's probably not > appropriate to continue discussing this in the context of the Apache Kudu > open source project. For feedback on CDH it's best to direct that towards > the associated vendor (Cloudera). Even though I happen to be employed by > them, I and other employees participate here in the Apache Kudu project as > individuals and it's important to keep the distinction separate. Kudu is a > product of the ASF non-profit organization, not a product of any commercial > vendor. > > -Todd > > > On Wed, Feb 28, 2018 at 6:17 AM, Boris Tyukin <[email protected]> > wrote: > >> first of all, thanks for a very quick response. It means a lot to know >> you guys stand behind Kudu and work with users like myself. In fact, we get >> a better support here than through the official channels :) >> >> That said, I needed to vent a bit :) Granted this should be probably >> directed at Impala devs but I think Kudu has been impacted the most. >> >> we ran a cluster health check command and saw some warning in there. But >> we proceeded with running insert statement as Todd suggested with hints and >> now we are back to our time we've used to get with Kudu 1.3 / Impala 2.8 >> >> it is a bit frustrating I must say that these changes impacted >> performance so dramatically. In my opinion, Kudu and CDH release notes must >> have stuff like that in BOLD RED colors so it does not catch users by >> surprise. And not everyone so organized like us - we knew exactly how much >> time it took with Kudu 1.3 /Impala 2.8. >> >> I've tracked down all the related JIRAs and I did not see a word about a >> dramatic performance degradation. I did see words like 'optimized' and >> 'improved'. >> >> Since it is part of the official CDH distro, I would expect a little bit >> more proactive warning. >> >> If you want to open a JIRA on this, would be happy to do it...We see this >> degradation all across our tables and as you can see from my example query, >> it is a really straight select from a table - no joins, no predicates and >> no complex calculations. >> >> Thanks again, >> Boris >> >> On Thu, Feb 22, 2018 at 2:44 PM, Todd Lipcon <[email protected]> wrote: >> >>> In addition to what Hao suggests, I think it's worth noting that the >>> insert query plan created by Impala changed a bit over time. >>> >>> It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a >>> very straightforward insert plan - each node separately inserted rows in >>> whatever order the rows were consumed. This plan worked well for smaller >>> inserts but could cause timeouts with larger workloads. >>> >>> In Impala 2.9, the plan was changed so that Impala performs some >>> shuffling and sorting before inserting into Kudu. This makes the Kudu >>> insert pattern more reliable and efficient, but could cause a degradation >>> for some workloads since Impala's sorts are single-threaded. >>> >>> Impala 2.10 (which I guess you are running) improved a bit over 2.9 in >>> ensuring that the sorts can be "partial" which resolved some of the >>> performance degradation, but it's possible your workload is still affected >>> negatively. >>> >>> To disable the new behavior you can use the insert hints 'noshuffle' >>> and/or 'noclustered', such as: >>> >>> upsert into my_table /* +noclustered,noshuffle */ select * from >>> my_other_table; >>> >>> >>> Hope that helps >>> -Todd >>> >>> On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <[email protected]> wrote: >>> >>>> Did you happen to check the health of the cluster after the upgrade by >>>> 'kudu >>>> cluster ksck'? >>>> >>>> Best, >>>> Hao >>>> >>>> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <[email protected]> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> we just upgraded our dev cluster from Kudu 1.3 to kudu >>>>> 1.5.0-cdh5.13.1 and noticed quite severe performance degradation. We did >>>>> CTAS from Impala parquet table which has not changed a bit since the >>>>> upgrade (even the same # of rows) to Kudu using the follow query below. >>>>> >>>>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours. >>>>> >>>>> Of course Impala version was also bumped with CDH 5.13. >>>>> >>>>> Any clue why it takes so much time now? >>>>> >>>>> Table has 5.5B rows.. >>>>> >>>>> create TABLE kudutest_ts.clinical_event_nots >>>>> >>>>> PRIMARY KEY (clinical_event_id) >>>>> >>>>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120 >>>>> >>>>> STORED AS KUDU >>>>> >>>>> AS >>>>> >>>>> SELECT >>>>> >>>>> clinical_event_id, >>>>> >>>>> encntr_id, >>>>> >>>>> person_id, >>>>> >>>>> encntr_financial_id, >>>>> >>>>> event_id, >>>>> >>>>> event_title_text, >>>>> >>>>> CAST(view_level as string) as view_level, >>>>> >>>>> order_id, >>>>> >>>>> catalog_cd, >>>>> >>>>> series_ref_nbr, >>>>> >>>>> accession_nbr, >>>>> >>>>> contributor_system_cd, >>>>> >>>>> reference_nbr, >>>>> >>>>> parent_event_id, >>>>> >>>>> event_reltn_cd, >>>>> >>>>> event_class_cd, >>>>> >>>>> event_cd, >>>>> >>>>> event_tag, >>>>> >>>>> CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os, >>>>> >>>>> result_val, >>>>> >>>>> result_units_cd, >>>>> >>>>> result_time_units_cd, >>>>> >>>>> task_assay_cd, >>>>> >>>>> record_status_cd, >>>>> >>>>> result_status_cd, >>>>> >>>>> CAST(authentic_flag as STRING) authentic_flag, >>>>> >>>>> CAST(publish_flag as STRING) publish_flag, >>>>> >>>>> qc_review_cd, >>>>> >>>>> normalcy_cd, >>>>> >>>>> normalcy_method_cd, >>>>> >>>>> inquire_security_cd, >>>>> >>>>> resource_group_cd, >>>>> >>>>> resource_cd, >>>>> >>>>> CAST(subtable_bit_map as STRING) subtable_bit_map, >>>>> >>>>> collating_seq, >>>>> >>>>> verified_prsnl_id, >>>>> >>>>> performed_prsnl_id, >>>>> >>>>> updt_id, >>>>> >>>>> CAST(updt_task as STRING) updt_task, >>>>> >>>>> updt_cnt, >>>>> >>>>> CAST(updt_applctx as STRING) updt_applctx, >>>>> >>>>> normal_low, >>>>> >>>>> normal_high, >>>>> >>>>> critical_low, >>>>> >>>>> critical_high, >>>>> >>>>> CAST(event_tag_set_flag as STRING) event_tag_set_flag, >>>>> >>>>> CAST(note_importance_bit_map as STRING) note_importance_bit_map, >>>>> >>>>> CAST(order_action_sequence as STRING) order_action_sequence, >>>>> >>>>> entry_mode_cd, >>>>> >>>>> source_cd, >>>>> >>>>> clinical_seq, >>>>> >>>>> CAST(event_end_tz as STRING) event_end_tz, >>>>> >>>>> CAST(event_start_tz as STRING) event_start_tz, >>>>> >>>>> CAST(performed_tz as STRING) performed_tz, >>>>> >>>>> CAST(verified_tz as STRING) verified_tz, >>>>> >>>>> task_assay_version_nbr, >>>>> >>>>> modifier_long_text_id, >>>>> >>>>> ce_dynamic_label_id, >>>>> >>>>> CAST(nomen_string_flag as STRING) nomen_string_flag, >>>>> >>>>> src_event_id, >>>>> >>>>> CAST(last_utc_ts as BIGINT) last_utc_ts, >>>>> >>>>> device_free_txt, >>>>> >>>>> CAST(trait_bit_map as STRING) trait_bit_map, >>>>> >>>>> CAST(clu_subkey1_flag as STRING) clu_subkey1_flag, >>>>> >>>>> CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm, >>>>> >>>>> CAST(event_end_dt_tm as BIGINT) event_end_dt_tm, >>>>> >>>>> CAST(event_start_dt_tm as BIGINT) event_start_dt_tm, >>>>> >>>>> CAST(expiration_dt_tm as BIGINT) expiration_dt_tm, >>>>> >>>>> CAST(verified_dt_tm as BIGINT) verified_dt_tm, >>>>> >>>>> CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm, >>>>> >>>>> CAST(updt_dt_tm as BIGINT) updt_dt_tm, >>>>> >>>>> CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm, >>>>> >>>>> CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm, >>>>> >>>>> CAST(performed_dt_tm as BIGINT) performed_dt_tm, >>>>> >>>>> txn_id_text, >>>>> >>>>> CAST(ingest_dt_tm as BIGINT) ingest_dt_tm >>>>> >>>>> FROM v500.clinical_event >>>>> >>>> >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
