James, Thanks for the response.
There could be a dozen or so users accessing the system and the same portions of the tables. The motive for salting has been to eliminate hot spotting - our data is time-series based and that is what the PK is based on. Thanks, Ralph On 6/8/15, 10:00 AM, "James Taylor" <[email protected]> wrote: >Hi Ralph, >What kind of workload do you expect on your cluster? Will there be >many users accessing many different parts of your table(s) >simultaneously? Have you considered not salting your tables? Or do you >have hot spotting issues at write time due to the layout of your PK >that salting is preventing? With the advent of table stats >(http://phoenix.apache.org/update_statistics.html), Phoenix is able to >parallelize queries along equal chunks of data, similar to the what >occurs with salting. > >The downside of salting is for queries that are only accessing a >handful of rows. Because Phoenix doesn't know which salt bucket >contains which of these rows, a scan always needs always be run for >every salt bucket. If you have 100 salt buckets, this is 100 scans >(worst case loading 100 blocks) versus a single scan for the unsalted >case (loading a single block). This will impact the throughput you >see. > >I'd encourage you to use Pherf (http://phoenix.apache.org/pherf.html) >to test salting (over multiple salt bucket sizes) versus unsalted for >realistic scenarios to get an accurate asssesment for your workload. > >Thanks, >James > >On Mon, Jun 8, 2015 at 9:34 AM, Perko, Ralph J <[email protected]> wrote: >> Hi – following up on this. >> >> Is it generally recommended to roughly match the salt bucket count to region >> server count? Or is it more arbitrary? Should I use something like 255 >> because the regions are going to split anyway? >> >> Thanks, >> Ralph >> >> >> From: "Perko, Ralph J" >> Reply-To: "[email protected]" >> Date: Friday, June 5, 2015 at 11:39 AM >> To: "[email protected]" >> Subject: Salt bucket count recommendation >> >> Hi, >> >> We have a 40 node cluster with 8 core tables and around 35 secondary index >> tables. The tables get very large – billions of records and terabytes of >> data. What salt bucket count do you recommend? >> >> Thanks, >> Ralph >>
