Number of buckets

Pavel Martynov Mon, 19 Jun 2017 06:12:36 -0700

Hi!

I can't find any generic recommendations to choose a number of buckets in
single-level hash partitioning.


All that I found:
* "For large tables, prefer to use roughly 10 partitions per server in the
cluster".
https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html#kudu_partitioning__kudu_hash_partitioning.
BTW, why 10? Looks like magic number for me :).
* Some recommendations:
https://kudu.apache.org/docs/known_issues.html#_scale

My use case: accumulate up to 500GB-1TB of day data and run some
aggregation with Spark on that data at day end.

On what values should buckets number depend on? A number of servers,
a number of disks (I use HDDs without any RAID), a number of CPU cores?

Any suggestions?

-- 
with best regards, Pavel Martynov

Number of buckets

Reply via email to