[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled
[ https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203527#comment-16203527 ] Rajesh Balamohan commented on HIVE-17012: - {{SemanticAnalyzer.genFileSinkPlan --> genBucketingSortingDest --> genReduceSinkPlan}} is setting to 2 reducers. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6704 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6714 Looking at this code path, it does not look like this is specific to ACID. > ACID Table: Number of reduce tasks should be computed correctly when > sort.dynamic.partition is enabled > -- > > Key: HIVE-17012 > URL: https://issues.apache.org/jira/browse/HIVE-17012 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Rajesh Balamohan > Labels: performance > Attachments: plan.txt > > > {code} > Map 1: 446/446 Reducer 2: 2/2 Reducer 3: 2/2 > -- > Compile Query 0.24s > Prepare Plan0.35s > Submit Plan 0.18s > Start DAG 0.21s > Run DAG 32332.27s > -- > Task Execution Summary > -- > VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS > OUTPUT_RECORDS > -- > Map 11390343.00 0 0 2,879,987,999 > 2,879,987,999 > Reducer 2 31281225.00 0 0 2,750,387,156 > 0 > Reducer 3 751498.00 0 0 129,600,843 > 0 > -- > {code} > Time taken: 32438.42 seconds to insert <3B rows with > {code} > create table store_sales > ( > ss_sold_time_sk bigint, > ss_item_skbigint, > ss_customer_skbigint, > ss_cdemo_sk bigint, > ss_hdemo_sk bigint, > ss_addr_skbigint, > ss_store_sk bigint, > ss_promo_sk bigint, > ss_ticket_number bigint, > ss_quantity int, > ss_wholesale_cost double, > ss_list_price double, > ss_sales_pricedouble, > ss_ext_discount_amt double, > ss_ext_sales_pricedouble, > ss_ext_wholesale_cost double, > ss_ext_list_price double, > ss_ext_taxdouble, > ss_coupon_amt double, > ss_net_paid double, > ss_net_paid_inc_tax double, > ss_net_profit double > ) > partitioned by (ss_sold_date_sk bigint) > CLUSTERED BY (ss_ticket_number) INTO 2 BUCKETS > STORED AS ORC > TBLPROPERTIES ('transactional'='true', 'transactional_properties'='default') > ; > from tpcds_text_1000.store_sales ss > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is not null > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax,
[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled
[ https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127983#comment-16127983 ] Eugene Koifman commented on HIVE-17012: --- Not sure if this is related but AbstractCorrelationProcCtx sets hive.optimize.reducededuplication.min.reduce=1 for acid > ACID Table: Number of reduce tasks should be computed correctly when > sort.dynamic.partition is enabled > -- > > Key: HIVE-17012 > URL: https://issues.apache.org/jira/browse/HIVE-17012 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Rajesh Balamohan > Labels: performance > Attachments: plan.txt > > > {code} > Map 1: 446/446 Reducer 2: 2/2 Reducer 3: 2/2 > -- > Compile Query 0.24s > Prepare Plan0.35s > Submit Plan 0.18s > Start DAG 0.21s > Run DAG 32332.27s > -- > Task Execution Summary > -- > VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS > OUTPUT_RECORDS > -- > Map 11390343.00 0 0 2,879,987,999 > 2,879,987,999 > Reducer 2 31281225.00 0 0 2,750,387,156 > 0 > Reducer 3 751498.00 0 0 129,600,843 > 0 > -- > {code} > Time taken: 32438.42 seconds to insert <3B rows with > {code} > create table store_sales > ( > ss_sold_time_sk bigint, > ss_item_skbigint, > ss_customer_skbigint, > ss_cdemo_sk bigint, > ss_hdemo_sk bigint, > ss_addr_skbigint, > ss_store_sk bigint, > ss_promo_sk bigint, > ss_ticket_number bigint, > ss_quantity int, > ss_wholesale_cost double, > ss_list_price double, > ss_sales_pricedouble, > ss_ext_discount_amt double, > ss_ext_sales_pricedouble, > ss_ext_wholesale_cost double, > ss_ext_list_price double, > ss_ext_taxdouble, > ss_coupon_amt double, > ss_net_paid double, > ss_net_paid_inc_tax double, > ss_net_profit double > ) > partitioned by (ss_sold_date_sk bigint) > CLUSTERED BY (ss_ticket_number) INTO 2 BUCKETS > STORED AS ORC > TBLPROPERTIES ('transactional'='true', 'transactional_properties'='default') > ; > from tpcds_text_1000.store_sales ss > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is not null > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is null > ; > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled
[ https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071990#comment-16071990 ] Rajesh Balamohan commented on HIVE-17012: - ReducerTraits would be FIXED for ACID tables with buckets. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L102 prevents from computing reducer tasks for Reducer 3. > ACID Table: Number of reduce tasks should be computed correctly when > sort.dynamic.partition is enabled > -- > > Key: HIVE-17012 > URL: https://issues.apache.org/jira/browse/HIVE-17012 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: plan.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)