sure, makes sense.

here's what I was trying to achieve - a sort merge bucket join.
as I understand, to get an SMB join, the bucket columns have to be the join key 
columns.
I am joining two tables- the key for each row in those tables is a composite 
key= acct + date.
now, does that mean, I cannot partition this table based on date?
If so, am I going to lose the benefit of partition pruning or does the SMB 
provides more benefits offsetting this situation?
 
Sincerely,


Ameet


________________________________
 
  
Hi Ameet,
That's the correct behaviour.

In Hive, clustering and sorting happens within a partition. Inside each 
partition, there is only one value associated with the partition column 
therefore it would have no impact on clustering and sorting. Therefore, putting 
the partition column in clustered by/sorted by in the create table statement 
doesn't make sense.

Also, your create table statement should be something like (note the removal of 
col3 from the column list):
create table abc ( col1 string, col2 string) 
partitioned by (col3 string) 
clustered by (col1) sorted by (col1) into 10 buckets; 

Mark

Reply via email to