I think this a very reasonable feature request. I have recently started
working with Kudu and the "default" behavior has already tripped me up a
couple times.

Thanks,

Abhi

On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburk...@apache.org> wrote:

> Hi all,
>
> One of the issues that trips up new Kudu users is the uncertainty about
> how partitioning works, and how to use partitioning effectively.  Much of
> this can be addressed with better documentation and explanatory materials,
> and that should be an area of focus leading up to our 1.0 release. However,
> the default partitioning behavior is suboptimal, and changing the default
> could lead to significantly less user confusion and frustration. Currently,
> when creating a new table, Kudu defaults to using only a single tablet,
> which is a known anti-pattern.  This can be painful for users who create a
> table assuming Kudu will have good defaults, and begin loading data only to
> find out later that they will need to recreate the table with partitioning
> to achieve good results.
>
> A better default partitioning strategy might be hash partitioning over the
> primary key columns, with a number of hash buckets based on the number of
> tablet servers (perhaps something like 3x the number of tablet servers).
> This would alleviate the worst scalability issues with the current default,
> however it has a few downsides of its own. Hash partitioning is not
> appropriate for every use case, and any rule-of-thumb number of tablets we
> could come up with will not always be optimal.
>
> Given that there is no bullet-proof default, and that changing
> partitioning strategy after table creation is impossible, and changing the
> default partitioning strategy is a backwards incompatible change, I propose
> we remove the default altogether.  Users would be required to explicitly
> specify the table partitioning during creation, and failing to do so would
> result in an illegal argument error.  Users who really do want only a
> single tablet will still be able to do so by explicitly configuring range
> partitioning with no split rows.
>
> I'd like to get community feedback on whether this seems like a good
> direction to take.  I have put together a patch, you can check out the
> changes to test files to see what it looks like to add partitioning
> explicitly in cases where the default was being relied on.
> http://gerrit.cloudera.org:8080/#/c/3131/
>
> - Dan
>



-- 
Abhi Basu

Reply via email to