I assume you primarily access your data using a time range query then,
so salting on the data tables makes sense. Is this the case for your
secondary indexes as well? Do they lead their PK with a date/time
column as well? Did you know you can turn salting off for an index
over a salted table (by specifying SALT_BUCKETS=0 when you create the
index)?

I'd recommend using Pherf to find the optimal number of salt buckets.
Perhaps you can start with 2x the number of region servers (+ some
expected future growth, as you can't change the salt bucket on a table
without re-writing it).

Thanks,
James

On Mon, Jun 8, 2015 at 10:08 AM, Perko, Ralph J <[email protected]> wrote:
> James,
>
> Thanks for the response.
>
> There could be a dozen or so users accessing the system and the same portions 
> of the tables.  The motive for salting has been to eliminate hot spotting - 
> our data is time-series based and that is what the PK is based on.
>
> Thanks,
> Ralph
>
>
>
>
> On 6/8/15, 10:00 AM, "James Taylor" <[email protected]> wrote:
>
>>Hi Ralph,
>>What kind of workload do you expect on your cluster? Will there be
>>many users accessing many different parts of your table(s)
>>simultaneously? Have you considered not salting your tables? Or do you
>>have hot spotting issues at write time due to the layout of your PK
>>that salting is preventing? With the advent of table stats
>>(http://phoenix.apache.org/update_statistics.html), Phoenix is able to
>>parallelize queries along equal chunks of data, similar to the what
>>occurs with salting.
>>
>>The downside of salting is for queries that are only accessing a
>>handful of rows. Because Phoenix doesn't know which salt bucket
>>contains which of these rows, a scan always needs always be run for
>>every salt bucket. If you have 100 salt buckets, this is 100 scans
>>(worst case loading 100 blocks) versus a single scan for the unsalted
>>case (loading a single block). This will impact the throughput you
>>see.
>>
>>I'd encourage you to use Pherf (http://phoenix.apache.org/pherf.html)
>>to test salting (over multiple salt bucket sizes) versus unsalted for
>>realistic scenarios to get an accurate asssesment for your workload.
>>
>>Thanks,
>>James
>>
>>On Mon, Jun 8, 2015 at 9:34 AM, Perko, Ralph J <[email protected]> wrote:
>>> Hi – following up on this.
>>>
>>> Is it generally recommended to roughly match the salt bucket count to region
>>> server count?  Or is it more arbitrary?  Should I use something like 255
>>> because the regions are going to split anyway?
>>>
>>> Thanks,
>>> Ralph
>>>
>>>
>>> From: "Perko, Ralph J"
>>> Reply-To: "[email protected]"
>>> Date: Friday, June 5, 2015 at 11:39 AM
>>> To: "[email protected]"
>>> Subject: Salt bucket count recommendation
>>>
>>> Hi,
>>>
>>> We have a 40 node cluster with 8 core tables and around 35 secondary index
>>> tables.  The tables get very large – billions of records and terabytes of
>>> data.  What salt bucket count do you recommend?
>>>
>>> Thanks,
>>> Ralph
>>>

Reply via email to