Vineet Garg created HIVE-20757:
----------------------------------

             Summary: Autogather stats doesn't work when SDPO (sort dynamic 
partition optimization) is ON
                 Key: HIVE-20757
                 URL: https://issues.apache.org/jira/browse/HIVE-20757
             Project: Hive
          Issue Type: Bug
          Components: Statistics
    Affects Versions: 4.0.0
            Reporter: Vineet Garg


*Reproducer*
{code:sql}
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.stats.autogather=true;

create table t11(i int, j int) partitioned by (s string);
insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');

hive> desc formatted t11 j;
OK
col_name                j
data_type               int
min
max
num_nulls
distinct_count
avg_col_len
max_col_len
num_trues
num_falses
bitVector
comment                 from deserializer
COLUMN_STATS_ACCURATE   {}
{code}

{code:sql}
hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5, 
'p2'),(6,9,'p3');

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
      DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: array(const struct(3,4,'p1'),const 
struct(4,5,'p2'),const struct(6,9,'p3')) (type: 
array<struct<col1:int,col2:int,col3:string>>)
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE 
Column stats: COMPLETE
                    UDTF Operator
                      Statistics: Num rows: 1 Data size: 64 Basic stats: 
COMPLETE Column stats: COMPLETE
                      function name: inline
                      Select Operator
                        expressions: col1 (type: int), col2 (type: int), col3 
(type: string)
                        outputColumnNames: _col0, _col1, _col2
                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col2 (type: string)
                          sort order: +
                          Map-reduce partition columns: _col2 (type: string)
                          Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                          value expressions: _col0 (type: int), _col1 (type: 
int)
        Reducer 2
            Execution mode: vectorized
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col0 (type: int), VALUE._col1 (type: int), 
KEY._col2 (type: string)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Dp Sort State: PARTITION_SORTED
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      name: default.t11

  Stage: Stage-2
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            s
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.t11

  Stage: Stage-3
    Stats Work
      Basic Stats Work:
      Column Stats Desc:
          Columns: i, j
          Column Types: int, int
          Table: default.t11
{code}

Notice that explain plan has autogather stats branch missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to