Re: When to use PARTITION BY HASH?

Oleksandr Shulgin Fri, 05 Jun 2020 03:13:27 -0700

On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff.ja...@gmail.com> wrote:


> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest.  So
>> your indexes will also not fit into memory.
>>
>> I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> While hash partitioning doesn't appeal to me, I think this may be overly
> pessimistic.  It would not be all that unusual for your customers to take
> turns being highly active and less active.  Especially if you do occasional
> bulk loads all with the same customer_id for any given load, for example.
>

For a bulk load you'd likely want to go with an empty partition w/o indexes
and build them later, after loading the tuples.  While it might not be
possible with any given partitioning scheme either, using hash partitioning
most certainly precludes that.


> So while you might not have a permanently hot partition, you could have
> partitions which are hot in turn.  Of course you could get the same benefit
> (and probably better) with list or range partitioning rather than hash, but
> then you have to maintain those lists or ranges when you add new customers.
>

Why are LRU eviction from the shared buffers and OS disk cache not good
enough to handle this?

This actually applies to any partitioning scheme: the hot dataset could be
recognized by these caching layers.  Does it not happen in practice?

--
Alex

Re: When to use PARTITION BY HASH?

Reply via email to