caofangkun created HIVE-4522:
--------------------------------

             Summary: Confusing result generated when use mulit aggregate 
functions with star columns
                 Key: HIVE-4522
                 URL: https://issues.apache.org/jira/browse/HIVE-4522
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.12.0
            Reporter: caofangkun
            Priority: Minor


hive (default)> set hive.cli.print.header=true;
hive (default)> select * from src;             
OK
key     value
        35
48      
100     100

Table src has two columns: key and value 
But guess how many columns the following query will generate ?
Three ? No, it's two .
hive (default)> select * , count(key) as cnt from src;
OK
(tok_function count (tok_table_or_col key))     cnt
3       3

And what about this query ?
hive (default)> select * , count(key), sum(value) as cnt from src group by key, 
value;
Four columns ? No, it's six!
hive (default)> select * , count(key) as cnt , sum(value) as sum_value from src 
group by key, value ;
OK
(tok_table_or_col key)  (tok_table_or_col value)        (tok_function count 
(tok_table_or_col key))     (tok_function sum (tok_table_or_col value))     cnt 
    sum_value
        35      1       35.0    1       35.0
100     100     1       100.0   1       100.0
48              1       0.0     1       0.0

The column names do not match and the result is Confusing。

Have a look at how such kind of queries work in MySQL :
mysql> select *, sum(id),count(data)  from example ;
+------+------+---------+-------------+
| id   | data | sum(id) | count(data) |
+------+------+---------+-------------+
|    1 | 2    |       6 |           3 |
+------+------+---------+-------------+
1 row in set (0.03 sec)

mysql> select *, sum(id) from example ;
+------+------+---------+
| id   | data | sum(id) |
+------+------+---------+
|    1 | 2    |       6 |
+------+------+---------+
1 row in set (0.09 sec)

mysql> select *, sum(id),count(data)  from example group by id, data ;
+------+------+---------+-------------+
| id   | data | sum(id) | count(data) |
+------+------+---------+-------------+
|    1 | 2    |       1 |           1 |
|    2 | 2    |       2 |           1 |
|    3 | 3    |       3 |           1 |
+------+------+---------+-------------+
3 rows in set (0.00 sec)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to