Ravindra Pesala created CARBONDATA-466:
------------------------------------------

             Summary: Implement bucketing table in carbondata
                 Key: CARBONDATA-466
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-466
             Project: CarbonData
          Issue Type: New Feature
            Reporter: Ravindra Pesala


Bucketing is the useful feature when user wants to join big tables. And also it 
is useful in driver level partition pruning to improve query performance.
User can add buckets on any dimension column (except complex types) as follows
{code}
CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
CLUSTERED BY(user_id) INTO 32 BUCKETS
STORED BY 'carbondata';
{code}
In the above example column user_id is hash partitioned and creates 32 bucket 
files in carbondata. So while doing the join with other table on bucketed 
column it can select same buckets and do the join with out shuffling.

Carbon format changes
1. Bucketing information needs to be stored inside schema thrift file
2. Bucket id can be stored inside every carbondata index file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to