Implicitly CLUSTER BY when dynamically partitioning ---------------------------------------------------
Key: HIVE-2363 URL: https://issues.apache.org/jira/browse/HIVE-2363 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Adam Kramer Priority: Critical Whenever someone is dynamically creating partitions, the underlying implementation is to look at the output data, write it to a file so long as the partition columns are contiguous, then to close that file and open a new one if the partition column changes. This leads to potentially way too many files generated. The solution is to ensure that a partition column's data all appears in a row and on the same reducer. I.e., to cluster by the partitioning columns on the way out. This improvement is to detect whether a query is clustering by the eventual partition columns, and if not, to do so as an additional step at the end of the query. This will potentially save lots of space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira