[ 
https://issues.apache.org/jira/browse/HIVE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739951#comment-13739951
 ] 

Micah Gutman commented on HIVE-5083:
------------------------------------

Finally found the bug by using "show extended <table> <partition spec>" to 
figure out that all partitions were pointing to a single file. My selects only 
looked like they were working, they were just reading the same data over and 
over.

Specifically, I created my partitions with "alter table" using multiple 
partition specs in the same command. Interestingly, the wiki page help said:

Note that it is proper syntax to have multiple partition_spec in a single ALTER 
TABLE, but if you do this in version 0.7, your partitioning scheme will fail. 
That is, every query specifying a partition will always use only the first 
partition.

I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) 
has this problem.
                
> Group by ignored when group by column is a partition column
> -----------------------------------------------------------
>
>                 Key: HIVE-5083
>                 URL: https://issues.apache.org/jira/browse/HIVE-5083
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.11.0
>         Environment: linux
>            Reporter: Micah Gutman
>
> I have an external table X with partition date (a string YYYYMMDD):
> select X.date, count(*) from X group by X.date
> Rather then get a count breakdown by date, I get a single row returned with 
> the count for the entire table. The "date" column returned in my single row 
> appears to be the last partition in the table.
> Note results appear as expected if I select an arbitrary "real" column from 
> my table:
> select X.foo, count(*) from X group by X.foo 
> correctly gives me a single row per value of X.foo.
> Also, my query works fine when I use the date column in the "where" clause, 
> so the partition does seem to be working.
> select X.date, count(*) from X where X.date = "20130101"
> correctly gives me a single row with the count for the date 20130101.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to