Hi Ralph, What kind of workload do you expect on your cluster? Will there be many users accessing many different parts of your table(s) simultaneously? Have you considered not salting your tables? Or do you have hot spotting issues at write time due to the layout of your PK that salting is preventing? With the advent of table stats (http://phoenix.apache.org/update_statistics.html), Phoenix is able to parallelize queries along equal chunks of data, similar to the what occurs with salting.
The downside of salting is for queries that are only accessing a handful of rows. Because Phoenix doesn't know which salt bucket contains which of these rows, a scan always needs always be run for every salt bucket. If you have 100 salt buckets, this is 100 scans (worst case loading 100 blocks) versus a single scan for the unsalted case (loading a single block). This will impact the throughput you see. I'd encourage you to use Pherf (http://phoenix.apache.org/pherf.html) to test salting (over multiple salt bucket sizes) versus unsalted for realistic scenarios to get an accurate asssesment for your workload. Thanks, James On Mon, Jun 8, 2015 at 9:34 AM, Perko, Ralph J <[email protected]> wrote: > Hi – following up on this. > > Is it generally recommended to roughly match the salt bucket count to region > server count? Or is it more arbitrary? Should I use something like 255 > because the regions are going to split anyway? > > Thanks, > Ralph > > > From: "Perko, Ralph J" > Reply-To: "[email protected]" > Date: Friday, June 5, 2015 at 11:39 AM > To: "[email protected]" > Subject: Salt bucket count recommendation > > Hi, > > We have a 40 node cluster with 8 core tables and around 35 secondary index > tables. The tables get very large – billions of records and terabytes of > data. What salt bucket count do you recommend? > > Thanks, > Ralph >
