I think this a very reasonable feature request. I have recently started working with Kudu and the "default" behavior has already tripped me up a couple times.
Thanks, Abhi On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburk...@apache.org> wrote: > Hi all, > > One of the issues that trips up new Kudu users is the uncertainty about > how partitioning works, and how to use partitioning effectively. Much of > this can be addressed with better documentation and explanatory materials, > and that should be an area of focus leading up to our 1.0 release. However, > the default partitioning behavior is suboptimal, and changing the default > could lead to significantly less user confusion and frustration. Currently, > when creating a new table, Kudu defaults to using only a single tablet, > which is a known anti-pattern. This can be painful for users who create a > table assuming Kudu will have good defaults, and begin loading data only to > find out later that they will need to recreate the table with partitioning > to achieve good results. > > A better default partitioning strategy might be hash partitioning over the > primary key columns, with a number of hash buckets based on the number of > tablet servers (perhaps something like 3x the number of tablet servers). > This would alleviate the worst scalability issues with the current default, > however it has a few downsides of its own. Hash partitioning is not > appropriate for every use case, and any rule-of-thumb number of tablets we > could come up with will not always be optimal. > > Given that there is no bullet-proof default, and that changing > partitioning strategy after table creation is impossible, and changing the > default partitioning strategy is a backwards incompatible change, I propose > we remove the default altogether. Users would be required to explicitly > specify the table partitioning during creation, and failing to do so would > result in an illegal argument error. Users who really do want only a > single tablet will still be able to do so by explicitly configuring range > partitioning with no split rows. > > I'd like to get community feedback on whether this seems like a good > direction to take. I have put together a patch, you can check out the > changes to test files to see what it looks like to add partitioning > explicitly in cases where the default was being relied on. > http://gerrit.cloudera.org:8080/#/c/3131/ > > - Dan > -- Abhi Basu