Balazs Varga created BAHIR-237:
----------------------------------

             Summary: Hash and range partitioning support in Flink-Kudu 
connector SQL DDL
                 Key: BAHIR-237
                 URL: https://issues.apache.org/jira/browse/BAHIR-237
             Project: Bahir
          Issue Type: Improvement
          Components: Flink Streaming Connectors
            Reporter: Balazs Varga


The current version of the Flink-Kudu connector's SQL DDL only supports a 
limited set of properties. With regards to partitioning, only the 
'kudu.hash-columns' option is available, which doesn't allow the setting of the 
number of hash partitions. Range partitioning is currently not supported.

Since partitioning cannot be altered later, it should be possible to set the 
number of hash buckets for each hash column in the DDL. A simple way to achieve 
this is using additional properties. Here are some ways I can think of 
specifying it:
 * 
'kudu.hash-columns'='col1,col2', kudu.hash-buckets'='4,8'
 * 
'kudu.hash-partitioning'='col1,4;col2,8'
 * 
'kudu.hash-buckets.col1' = '4', 'kudu.hash-buckets.col2' = '8'

I'd appreciate your input regarding which approach would be the best.

For range partitioning, I recommend adding a property to set the range 
partitioning columns: 'kudu.range-columns'='col1,col2'

If this is correctly set for a table, the partitions themselves can be added 
later using ALTER TABLE. Specifying the ranges here would add a lot of complex 
(parsing) logic. 

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to