Hi Ralph,
What kind of workload do you expect on your cluster? Will there be
many users accessing many different parts of your table(s)
simultaneously? Have you considered not salting your tables? Or do you
have hot spotting issues at write time due to the layout of your PK
that salting is preventing? With the advent of table stats
(http://phoenix.apache.org/update_statistics.html), Phoenix is able to
parallelize queries along equal chunks of data, similar to the what
occurs with salting.

The downside of salting is for queries that are only accessing a
handful of rows. Because Phoenix doesn't know which salt bucket
contains which of these rows, a scan always needs always be run for
every salt bucket. If you have 100 salt buckets, this is 100 scans
(worst case loading 100 blocks) versus a single scan for the unsalted
case (loading a single block). This will impact the throughput you
see.

I'd encourage you to use Pherf (http://phoenix.apache.org/pherf.html)
to test salting (over multiple salt bucket sizes) versus unsalted for
realistic scenarios to get an accurate asssesment for your workload.

Thanks,
James

On Mon, Jun 8, 2015 at 9:34 AM, Perko, Ralph J <[email protected]> wrote:
> Hi – following up on this.
>
> Is it generally recommended to roughly match the salt bucket count to region
> server count?  Or is it more arbitrary?  Should I use something like 255
> because the regions are going to split anyway?
>
> Thanks,
> Ralph
>
>
> From: "Perko, Ralph J"
> Reply-To: "[email protected]"
> Date: Friday, June 5, 2015 at 11:39 AM
> To: "[email protected]"
> Subject: Salt bucket count recommendation
>
> Hi,
>
> We have a 40 node cluster with 8 core tables and around 35 secondary index
> tables.  The tables get very large – billions of records and terabytes of
> data.  What salt bucket count do you recommend?
>
> Thanks,
> Ralph
>

Reply via email to