[ https://issues.apache.org/jira/browse/TRAFODION-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hans Zeller reopened TRAFODION-2965: ------------------------------------ Update fix version > Hash partial groupby does not report a row count in operator statistics > ----------------------------------------------------------------------- > > Key: TRAFODION-2965 > URL: https://issues.apache.org/jira/browse/TRAFODION-2965 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe > Affects Versions: 2.0-incubating > Environment: any > Reporter: Hans Zeller > Assignee: Hans Zeller > Priority: Major > Fix For: 2.4 > > > Here is a test case that demonstrates this: > {noformat} > update statistics for table hive.hive.time_dim on every column; > control query shape groupby(exchange(groupby(exchange(groupby(scan))))); > prepare s from > select count(distinct t_time_id) from hive.hive.time_dim; > explain options 'f' s; > execute s; > get statistics for qid current default; > {noformat} > The actual statistics show a "0" as the row count (ActRowsUsed) for the > lowest EX_HASH_GRBY with id 2: > {noformat} > LC RC Id PaId ExId Frag TDB Name DOP Dispatches > OperCpuTime EstRowsUsed ActRowsUsed ActDataUsed > Details > 13 . 14 . 8 0 EX_ROOT 1 2 > 37 0 1 8 3658 > 12 . 13 14 7 0 EX_SORT_GRBY 1 5 > 61 1 1 8 > 11 . 12 13 6 0 EX_SPLIT_TOP 1 8 > 171 1 4 32 > 10 . 11 12 6 0 EX_SEND_TOP 4 16 > 3,389 1 4 64 > 9 . 10 11 6 2 EX_SEND_BOTTOM 4 12 > 683 1 4 64 > 8 . 9 10 6 2 EX_SPLIT_BOTTOM 4 19 > 597 1 4 32 > 833874 > 7 . 8 9 5 2 EX_SORT_GRBY 4 2,259 > 289,200 1 4 32 > 6 . 7 8 4 2 EX_HASH_GRBY 4 2,259 > 394,235 1451 86,400 2,765,059,200 0|0|0 > 5 . 6 7 3 2 EX_SPLIT_TOP 4 2,211 > 51,034 110007 86,400 2,765,059,200 > 4 . 5 6 3 2 EX_SEND_TOP 8 2,436 > 98,125 110007 86,400 2,766,528,000 > 3 . 4 5 3 3 EX_SEND_BOTTOM 8 10,658 > 196,714 110007 86,400 2,766,528,000 > 2 . 3 4 3 3 EX_SPLIT_BOTTOM 2 2,663 > 95,246 110007 86,400 2,765,059,200 > 1547521 > 1 . 2 3 2 3 EX_HASH_GRBY 2 4,521 > 303,319 55003.5 0 0 > . . 1 2 1 3 EX_HDFS_SCAN 2 2,650 > 952,242 116085 86,400 2,765,059,200 > HIVE.HIVE.TIME_DIM|86400|5288324 > {noformat} > The reason is that the hash groupby reports its row count in the BMO stats. > However, a partial hash groupby is not considered a BMO (Big Memory > Operator), so no rowcount gets reported. The fix is to increment the rowcount > in the generic stats entry that is present in both partial and full groupby > operators. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)