On Thu, Jun 4, 2020 at 4:32 PM Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Wed, Jun 3, 2020 at 7:55 AM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > With hash partitioning you are not expected, in general, to end up with a >> small number of partitions being accessed more heavily than the rest. So >> your indexes will also not fit into memory. >> >> I have the feeling that using a hash function to distribute rows simply >> contradicts the basic assumption of when you would think of partitioning >> your table at all: that is to make sure the most active part of the table >> and indexes is small enough to be cached in memory. >> > > While hash partitioning doesn't appeal to me, I think this may be overly > pessimistic. It would not be all that unusual for your customers to take > turns being highly active and less active. Especially if you do occasional > bulk loads all with the same customer_id for any given load, for example. > For a bulk load you'd likely want to go with an empty partition w/o indexes and build them later, after loading the tuples. While it might not be possible with any given partitioning scheme either, using hash partitioning most certainly precludes that. > So while you might not have a permanently hot partition, you could have > partitions which are hot in turn. Of course you could get the same benefit > (and probably better) with list or range partitioning rather than hash, but > then you have to maintain those lists or ranges when you add new customers. > Why are LRU eviction from the shared buffers and OS disk cache not good enough to handle this? This actually applies to any partitioning scheme: the hot dataset could be recognized by these caching layers. Does it not happen in practice? -- Alex