[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled

2017-10-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203527#comment-16203527
 ] 

Rajesh Balamohan commented on HIVE-17012:
-

{{SemanticAnalyzer.genFileSinkPlan --> genBucketingSortingDest --> 
genReduceSinkPlan}} is setting to 2 reducers.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6704

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6714

Looking at this code path, it does not look like this is specific to ACID.



> ACID Table: Number of reduce tasks should be computed correctly when 
> sort.dynamic.partition is enabled
> --
>
> Key: HIVE-17012
> URL: https://issues.apache.org/jira/browse/HIVE-17012
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: plan.txt
>
>
> {code}
> Map 1: 446/446 Reducer 2: 2/2  Reducer 3: 2/2
> --
> Compile Query   0.24s
> Prepare Plan0.35s
> Submit Plan 0.18s
> Start DAG   0.21s
> Run DAG 32332.27s
> --
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 11390343.00  0  0   2,879,987,999
> 2,879,987,999
>  Reducer 2   31281225.00  0  0   2,750,387,156
> 0
>  Reducer 3 751498.00  0  0 129,600,843
> 0
> --
> {code}
>  Time taken: 32438.42 seconds to insert <3B rows with 
> {code}
> create table store_sales
> (
> ss_sold_time_sk   bigint,
> ss_item_skbigint,
> ss_customer_skbigint,
> ss_cdemo_sk   bigint,
> ss_hdemo_sk   bigint,
> ss_addr_skbigint,
> ss_store_sk   bigint,
> ss_promo_sk   bigint,
> ss_ticket_number  bigint,
> ss_quantity   int,
> ss_wholesale_cost double,
> ss_list_price double,
> ss_sales_pricedouble,
> ss_ext_discount_amt   double,
> ss_ext_sales_pricedouble,
> ss_ext_wholesale_cost double,
> ss_ext_list_price double,
> ss_ext_taxdouble,
> ss_coupon_amt double,
> ss_net_paid   double,
> ss_net_paid_inc_tax   double,
> ss_net_profit double
> )
> partitioned by (ss_sold_date_sk bigint)
> CLUSTERED BY (ss_ticket_number) INTO 2 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true', 'transactional_properties'='default')
> ;
> from tpcds_text_1000.store_sales ss
> insert into table store_sales partition (ss_sold_date_sk) 
> select
> ss.ss_sold_time_sk,
> ss.ss_item_sk,
> ss.ss_customer_sk,
> ss.ss_cdemo_sk,
> ss.ss_hdemo_sk,
> ss.ss_addr_sk,
> ss.ss_store_sk,
> ss.ss_promo_sk,
> ss.ss_ticket_number,
> ss.ss_quantity,
> ss.ss_wholesale_cost,
> ss.ss_list_price,
> ss.ss_sales_price,
> ss.ss_ext_discount_amt,
> ss.ss_ext_sales_price,
> ss.ss_ext_wholesale_cost,
> ss.ss_ext_list_price,
> ss.ss_ext_tax,
> ss.ss_coupon_amt,
> ss.ss_net_paid,
> ss.ss_net_paid_inc_tax,
> ss.ss_net_profit,
> ss.ss_sold_date_sk
> where ss.ss_sold_date_sk is not null
> insert into table store_sales partition (ss_sold_date_sk) 
> select
> ss.ss_sold_time_sk,
> ss.ss_item_sk,
> ss.ss_customer_sk,
> ss.ss_cdemo_sk,
> ss.ss_hdemo_sk,
> ss.ss_addr_sk,
> ss.ss_store_sk,
> ss.ss_promo_sk,
> ss.ss_ticket_number,
> ss.ss_quantity,
> ss.ss_wholesale_cost,
> ss.ss_list_price,
> ss.ss_sales_price,
> ss.ss_ext_discount_amt,
> ss.ss_ext_sales_price,
> ss.ss_ext_wholesale_cost,
> ss.ss_ext_list_price,
> ss.ss_ext_tax,

[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled

2017-08-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127983#comment-16127983
 ] 

Eugene Koifman commented on HIVE-17012:
---

Not sure if this is related but AbstractCorrelationProcCtx sets
hive.optimize.reducededuplication.min.reduce=1 for acid

> ACID Table: Number of reduce tasks should be computed correctly when 
> sort.dynamic.partition is enabled
> --
>
> Key: HIVE-17012
> URL: https://issues.apache.org/jira/browse/HIVE-17012
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: plan.txt
>
>
> {code}
> Map 1: 446/446 Reducer 2: 2/2  Reducer 3: 2/2
> --
> Compile Query   0.24s
> Prepare Plan0.35s
> Submit Plan 0.18s
> Start DAG   0.21s
> Run DAG 32332.27s
> --
> Task Execution Summary
> --
>   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   INPUT_RECORDS   
> OUTPUT_RECORDS
> --
>  Map 11390343.00  0  0   2,879,987,999
> 2,879,987,999
>  Reducer 2   31281225.00  0  0   2,750,387,156
> 0
>  Reducer 3 751498.00  0  0 129,600,843
> 0
> --
> {code}
>  Time taken: 32438.42 seconds to insert <3B rows with 
> {code}
> create table store_sales
> (
> ss_sold_time_sk   bigint,
> ss_item_skbigint,
> ss_customer_skbigint,
> ss_cdemo_sk   bigint,
> ss_hdemo_sk   bigint,
> ss_addr_skbigint,
> ss_store_sk   bigint,
> ss_promo_sk   bigint,
> ss_ticket_number  bigint,
> ss_quantity   int,
> ss_wholesale_cost double,
> ss_list_price double,
> ss_sales_pricedouble,
> ss_ext_discount_amt   double,
> ss_ext_sales_pricedouble,
> ss_ext_wholesale_cost double,
> ss_ext_list_price double,
> ss_ext_taxdouble,
> ss_coupon_amt double,
> ss_net_paid   double,
> ss_net_paid_inc_tax   double,
> ss_net_profit double
> )
> partitioned by (ss_sold_date_sk bigint)
> CLUSTERED BY (ss_ticket_number) INTO 2 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true', 'transactional_properties'='default')
> ;
> from tpcds_text_1000.store_sales ss
> insert into table store_sales partition (ss_sold_date_sk) 
> select
> ss.ss_sold_time_sk,
> ss.ss_item_sk,
> ss.ss_customer_sk,
> ss.ss_cdemo_sk,
> ss.ss_hdemo_sk,
> ss.ss_addr_sk,
> ss.ss_store_sk,
> ss.ss_promo_sk,
> ss.ss_ticket_number,
> ss.ss_quantity,
> ss.ss_wholesale_cost,
> ss.ss_list_price,
> ss.ss_sales_price,
> ss.ss_ext_discount_amt,
> ss.ss_ext_sales_price,
> ss.ss_ext_wholesale_cost,
> ss.ss_ext_list_price,
> ss.ss_ext_tax,
> ss.ss_coupon_amt,
> ss.ss_net_paid,
> ss.ss_net_paid_inc_tax,
> ss.ss_net_profit,
> ss.ss_sold_date_sk
> where ss.ss_sold_date_sk is not null
> insert into table store_sales partition (ss_sold_date_sk) 
> select
> ss.ss_sold_time_sk,
> ss.ss_item_sk,
> ss.ss_customer_sk,
> ss.ss_cdemo_sk,
> ss.ss_hdemo_sk,
> ss.ss_addr_sk,
> ss.ss_store_sk,
> ss.ss_promo_sk,
> ss.ss_ticket_number,
> ss.ss_quantity,
> ss.ss_wholesale_cost,
> ss.ss_list_price,
> ss.ss_sales_price,
> ss.ss_ext_discount_amt,
> ss.ss_ext_sales_price,
> ss.ss_ext_wholesale_cost,
> ss.ss_ext_list_price,
> ss.ss_ext_tax,
> ss.ss_coupon_amt,
> ss.ss_net_paid,
> ss.ss_net_paid_inc_tax,
> ss.ss_net_profit,
> ss.ss_sold_date_sk
> where ss.ss_sold_date_sk is null
> ;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled

2017-07-03 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071990#comment-16071990
 ] 

Rajesh Balamohan commented on HIVE-17012:
-

ReducerTraits would be FIXED for ACID tables with buckets. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java#L102
 prevents from computing reducer tasks for Reducer 3.

> ACID Table: Number of reduce tasks should be computed correctly when 
> sort.dynamic.partition is enabled
> --
>
> Key: HIVE-17012
> URL: https://issues.apache.org/jira/browse/HIVE-17012
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: plan.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)