[ 
https://issues.apache.org/jira/browse/HIVE-25913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485554#comment-17485554
 ] 

Rajesh Balamohan commented on HIVE-25913:
-----------------------------------------

This turned out to be a migration issue from external table to iceberg table. 
During migration, it didn't create the partition specs correctly. After 
regenerating the data, it creates the partition specs correctly and is no 
longer an issue. I will close this ticket.

> Dynamic Partition Pruning Operator: Not working in iceberg tables
> -----------------------------------------------------------------
>
>                 Key: HIVE-25913
>                 URL: https://issues.apache.org/jira/browse/HIVE-25913
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Major
>
> Notice "Dynamic Partitioning Event Operator" missing in Map 3 in iceberg 
> tables. Partition keys are added as embedded fields in iceberg tables. This 
> causes issues in DynamicPartitionPruningOptimisation codepath. E.g 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java#L182]
>  
> This causes heavy IO in iceberg tables leading to perf degradation.
> {noformat}
> ACID table
> ==========
> explain select count(*) from store_sales, date_dim  where d_month_seq between 
> 1212 and 1212+11 and ss_store_sk is not null and ss_sold_date_sk=d_date_sk;
> Explain
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
>       Edges:
>         Map 1 <- Map 3 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>       DagName: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   filterExpr: ss_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 27503885621 Data size: 434880571744 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ss_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 26856185846 Data size: 424639398832 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ss_sold_date_sk (type: bigint)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 26856185846 Data size: 
> 214849486768 Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: bigint)
>                           1 _col0 (type: bigint)
>                         input vertices:
>                           1 Map 3
>                         Statistics: Num rows: 5279977323 Data size: 
> 42239818584 Basic stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           aggregations: count()
>                           minReductionHashAggr: 0.99
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                           Reduce Output Operator
>                             null sort order:
>                             sort order:
>                             Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                             value expressions: _col0 (type: bigint)
>             Execution mode: vectorized, llap
>             LLAP IO: may be used (ACID table)
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and 
> d_date_sk is not null) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 876588 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (d_month_seq BETWEEN 1212 AND 1223 and 
> d_date_sk is not null) (type: boolean)
>                     Statistics: Num rows: 359 Data size: 4308 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: d_date_sk (type: bigint)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 359 Data size: 2872 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: bigint)
>                         null sort order: a
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: bigint)
>                         Statistics: Num rows: 359 Data size: 2872 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                       Select Operator
>                         expressions: _col0 (type: bigint)
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 359 Data size: 2872 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                         Group By Operator
>                           keys: _col0 (type: bigint)
>                           minReductionHashAggr: 0.5013927
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 179 Data size: 1432 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                           Dynamic Partitioning Event Operator
>                             Target column: ss_sold_date_sk (bigint)
>                             Target Input: store_sales
>                             Partition key expr: ss_sold_date_sk
>                             Statistics: Num rows: 179 Data size: 1432 Basic 
> stats: COMPLETE Column stats: COMPLETE
>                             Target Vertex: Map 1
>             Execution mode: vectorized, llap
>             LLAP IO: may be used (ACID table)
>         Reducer 2
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0
>  
> Iceberg table:
> ==============
> Explain
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
>       Edges:
>         Map 1 <- Map 3 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>       DagName: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   filterExpr: (ss_store_sk is not null and ss_sold_date_sk is 
> not null) (type: boolean)
>                   Statistics: Num rows: 28800426268 Data size: 437766480832 
> Basic stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (ss_store_sk is not null and ss_sold_date_sk 
> is not null) (type: boolean)
>                     Statistics: Num rows: 25920383834 Data size: 393989835679 
> Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: ss_sold_date_sk (type: bigint)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 25920383834 Data size: 
> 393989835679 Basic stats: COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: bigint)
>                           1 _col0 (type: bigint)
>                         input vertices:
>                           1 Map 3
>                         Statistics: Num rows: 28512422835 Data size: 
> 433388828640 Basic stats: COMPLETE Column stats: NONE
>                         Group By Operator
>                           aggregations: count()
>                           minReductionHashAggr: 0.99
>                           mode: hash
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: NONE
>                           Reduce Output Operator
>                             null sort order:
>                             sort order:
>                             Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: NONE
>                             value expressions: _col0 (type: bigint)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs (cache only)
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: date_dim
>                   filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and 
> d_date_sk is not null) (type: boolean)
>                   Statistics: Num rows: 73049 Data size: 832776 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (d_month_seq BETWEEN 1212 AND 1223 and 
> d_date_sk is not null) (type: boolean)
>                     Statistics: Num rows: 1 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: d_date_sk (type: bigint)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: bigint)
>                         null sort order: a
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: bigint)
>                         Statistics: Num rows: 1 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs (cache only)
>         Reducer 2
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: count(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to