Hi,

in our company, we are using a Hive table which is both partitioned and
clustered similar to the following snippet:

PARTITIONED BY (year INT, month INT, day INT, feed STRING)
CLUSTERED BY (key) INTO 1024 BUCKETS

Using this input table we regularly perform queries where we group by key
across multiple partitions.

Now, my questions are the following:

1. Does Hive take advantage from such a table layout in a way that the
group by operation is executed more efficiently (in comparison to a similar
table, which is partitioned but not clustered)?
2. If yes, is this kind of behaviour enabled by default or do I have to
specify certain options?
3. Would it help to sort the buckets?

Our Hive version is 1.1.0.

Thank you very much in advance.

Cheers
Jan

Reply via email to