Balazs Varga created BAHIR-237: ---------------------------------- Summary: Hash and range partitioning support in Flink-Kudu connector SQL DDL Key: BAHIR-237 URL: https://issues.apache.org/jira/browse/BAHIR-237 Project: Bahir Issue Type: Improvement Components: Flink Streaming Connectors Reporter: Balazs Varga
The current version of the Flink-Kudu connector's SQL DDL only supports a limited set of properties. With regards to partitioning, only the 'kudu.hash-columns' option is available, which doesn't allow the setting of the number of hash partitions. Range partitioning is currently not supported. Since partitioning cannot be altered later, it should be possible to set the number of hash buckets for each hash column in the DDL. A simple way to achieve this is using additional properties. Here are some ways I can think of specifying it: * 'kudu.hash-columns'='col1,col2', kudu.hash-buckets'='4,8' * 'kudu.hash-partitioning'='col1,4;col2,8' * 'kudu.hash-buckets.col1' = '4', 'kudu.hash-buckets.col2' = '8' I'd appreciate your input regarding which approach would be the best. For range partitioning, I recommend adding a property to set the range partitioning columns: 'kudu.range-columns'='col1,col2' If this is correctly set for a table, the partitions themselves can be added later using ALTER TABLE. Specifying the ranges here would add a lot of complex (parsing) logic. {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)