Hi, in our company, we are using a Hive table which is both partitioned and clustered similar to the following snippet:
PARTITIONED BY (year INT, month INT, day INT, feed STRING) CLUSTERED BY (key) INTO 1024 BUCKETS Using this input table we regularly perform queries where we group by key across multiple partitions. Now, my questions are the following: 1. Does Hive take advantage from such a table layout in a way that the group by operation is executed more efficiently (in comparison to a similar table, which is partitioned but not clustered)? 2. If yes, is this kind of behaviour enabled by default or do I have to specify certain options? 3. Would it help to sort the buckets? Our Hive version is 1.1.0. Thank you very much in advance. Cheers Jan