Re: Add automatic/default SALT

2017-12-27 Thread James Taylor
There's some information in our Tuning Guide[1] on recommendations of when
to use or not use salted tables. We don't recommend it unless your table
has a monotonically increasing primary key. Understanding why is best
explained with an example. Let's say you have a table with SALT_BUCKETS=20.
When you execute a simple query against that table that might return 10
contiguous rows, you'll be executing 20 scans instead of just one. Each
scan will open a block on the region server - that's 20 block fetches
versus what would otherwise be a single block fetch (assuming that the 10
rows being returned are in the same block since they're contiguous). The
only time you're not hit with this 20x block fetch cost is if you're doing
a point lookup (as the client can precompute the salt byte in that case).

[1] https://phoenix.apache.org/tuning_guide.html

On Wed, Dec 27, 2017 at 3:26 PM, Flavio Pompermaier 
wrote:

> Hi Josh,
> Thanks for the feedback. Do you have any concrete example where salted
> tables are 'evil'? However I really like the idea to enable salting using
> some predefined variable (like number of region servers or something like
> that).
> An example could be:
>
> SALT_BUCKETS = $REGION_SERVERS_COUNT
>
> Best,
> Flavio
>
>
> On 12 Dec 2017 01:45, "Josh Elser"  wrote:
>
> I'm a little hesitant of this for a few things I've noticed from lots of
> various installations:
>
> * Salted tables are *not* always more efficient. In fact, I've found
> myself giving advice to not use salted tables a bit more than expected.
> Certain kinds of queries will require much more work if you have salting
> over not having salting
>
> * Considering salt buckets as a measure of parallelism for a table, it's
> impossible for the system to correctly judge what the parallelism of the
> cluster should be. For example, with 10 RS and 1 Phoenix table, you would
> want to start with 10 salt buckets. However, with 10 RS and 100 Phoenix
> tables, you'd *maybe* want to do 3 salt buckets. It's hard to make system
> wide decisions correctly without a global view of the entire system.
>
> I think James was trying to capture some of this in his use of "relative
> conservative default", but I'd take that even a bit farther to say I
> consider it harmful for Phoenix to do that out of the box.
>
> However, I would flip the question upside down instead: what kind of
> suggestions can Phoenix make as a database to the user to _recommend_ to
> them that they enable salting on a table given its schema and important
> queries?
>
>
> On 12/8/17 12:34 PM, James Taylor wrote:
>
>> Hi Flavio,
>> I like the idea of “adaptable configuration” where you specify a config
>> value as a % of some cluster resource (with relatively conservative
>> defaults). Salting is somewhat of a gray area though as it’s not config
>> based, but driven by your DDL. One solution you could implement on top of
>> Phoenix is scripting for DDL that fills in the salt bucket parameter based
>> on cluster size.
>> Thanks,
>> James
>>
>> On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier > > wrote:
>>
>> Hi to all,
>> as stated by at the documentation[1] "for optimal performance,
>> number of salt buckets should match number of region servers".
>> So, why not to add an option AUTO/DEFAULT for salting that defaults
>> this parameter to the number of region servers?
>> Otherwise I have to manually connect to HBase, retrieve that number
>> and pass to Phoenix...
>> What do you think?
>>
>> [1] https://phoenix.apache.org/performance.html#Salting
>>
>> Best,
>> Flavio
>>
>>
>


Re: Add automatic/default SALT

2017-12-11 Thread Josh Elser
I'm a little hesitant of this for a few things I've noticed from lots of 
various installations:


* Salted tables are *not* always more efficient. In fact, I've found 
myself giving advice to not use salted tables a bit more than expected. 
Certain kinds of queries will require much more work if you have salting 
over not having salting


* Considering salt buckets as a measure of parallelism for a table, it's 
impossible for the system to correctly judge what the parallelism of the 
cluster should be. For example, with 10 RS and 1 Phoenix table, you 
would want to start with 10 salt buckets. However, with 10 RS and 100 
Phoenix tables, you'd *maybe* want to do 3 salt buckets. It's hard to 
make system wide decisions correctly without a global view of the entire 
system.


I think James was trying to capture some of this in his use of "relative 
conservative default", but I'd take that even a bit farther to say I 
consider it harmful for Phoenix to do that out of the box.


However, I would flip the question upside down instead: what kind of 
suggestions can Phoenix make as a database to the user to _recommend_ to 
them that they enable salting on a table given its schema and important 
queries?


On 12/8/17 12:34 PM, James Taylor wrote:

Hi Flavio,
I like the idea of “adaptable configuration” where you specify a config 
value as a % of some cluster resource (with relatively conservative 
defaults). Salting is somewhat of a gray area though as it’s not config 
based, but driven by your DDL. One solution you could implement on top 
of Phoenix is scripting for DDL that fills in the salt bucket parameter 
based on cluster size.

Thanks,
James

On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier > wrote:


Hi to all,
as stated by at the documentation[1] "for optimal performance,
number of salt buckets should match number of region servers".
So, why not to add an option AUTO/DEFAULT for salting that defaults
this parameter to the number of region servers?
Otherwise I have to manually connect to HBase, retrieve that number
and pass to Phoenix...
What do you think?

[1] https://phoenix.apache.org/performance.html#Salting

Best,
Flavio



Re: Add automatic/default SALT

2017-12-08 Thread James Taylor
Hi Flavio,
I like the idea of “adaptable configuration” where you specify a config
value as a % of some cluster resource (with relatively conservative
defaults). Salting is somewhat of a gray area though as it’s not config
based, but driven by your DDL. One solution you could implement on top of
Phoenix is scripting for DDL that fills in the salt bucket parameter based
on cluster size.
Thanks,
James

On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier 
wrote:

> Hi to all,
> as stated by at the documentation[1] "for optimal performance, number of
> salt buckets should match number of region servers".
> So, why not to add an option AUTO/DEFAULT for salting that defaults this
> parameter to the number of region servers?
> Otherwise I have to manually connect to HBase, retrieve that number and
> pass to Phoenix...
> What do you think?
>
> [1] https://phoenix.apache.org/performance.html#Salting
>
> Best,
> Flavio
>


Add automatic/default SALT

2017-12-05 Thread Flavio Pompermaier
Hi to all,
as stated by at the documentation[1] "for optimal performance, number of
salt buckets should match number of region servers".
So, why not to add an option AUTO/DEFAULT for salting that defaults this
parameter to the number of region servers?
Otherwise I have to manually connect to HBase, retrieve that number and
pass to Phoenix...
What do you think?

[1] https://phoenix.apache.org/performance.html#Salting

Best,
Flavio