Re: When to use PARTITION BY HASH?

Oleksandr Shulgin Mon, 08 Jun 2020 01:41:13 -0700

On Sat, Jun 6, 2020 at 6:14 PM Michel Pelletier <pelletier.mic...@gmail.com>
wrote:


>
> Well lets take a step back here and look at the question, hash
> partitioning exists in Postgres, is it useful?  While I appreciate the need
> to see a fact demonstrated, and generally avoiding argument by authority,
> it is true that many of the very smartest database people in the world
> conceived of, discussed, implemented and documented this feature for us.
> It stands to reason that it is useful, or it wouldn't exist.  So maybe this
> is more about finding or needing better partitioning documentation.
>

Fair point.

I've found the original commit adding this feature in version 11:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e
It says:

"Hash partitioning is useful when you want to partition a growing data
set evenly.  This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join."

It also includes a link to discussion, though that starts in the middle of
a long thread.
The original thread is here:
https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata%40sraoss.co.jp

However, these threads only argue about implementation details and it's not
easy to find a discussion of motivation for this particular partitioning
scheme support.
I guess it was quite obvious to the participants at that point already.

With hash partitioning you are not expected, in general, to end up with a
>> small number of partitions being accessed more heavily than the rest.  So
>> your indexes will also not fit into memory.
>>
>
> Indexes are not (usually) constant time structures, they take more time
> the bigger they get.  So partitioned indexes will be smaller, quicker to
> insert into, and quicker to vacuum, and also gain possible pruning
> advantages on query when you split them up.  If the planner can, knowing
> the key, exclude all but one partition, it won't even look at the other
> tables, so if you hash partition by primary key, you reduce the search
> space to 1/N immediately.
>
> Indexes with high update activity also suffer from a problem called "index
> bloat" where spares "holes" get punched in the buckets of btree indexes
> from updates and delete deletes.  These holes are minimized by vacuuming
> but the bigger the index gets, the harder that process is to maintain.
> Smaller indexes suffer less from index bloat, and remedying the situation
> is easier because you can reindex partitions independently of each other.
> Your not just reducing the query load to an Nth, you're reducing the
> maintenance load.
>

Thanks for taking your time to explain it in detail.  Though I do not tend
to believe the insert/scan performance benefit is measurable without trying
it, I do see the benefits for maintenance.

I have the feeling that using a hash function to distribute rows simply
>> contradicts the basic assumption of when you would think of partitioning
>> your table at all: that is to make sure the most active part of the table
>> and indexes is small enough to be cached in memory.
>>
>
> I think you might be framing this with a specific data pattern in mind,
> not all data distributions have a "most active" or power law distribution
> of data.
>

I'm just referring to the first bullet-point in the docs:

"Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are in a
single partition or a small number of partitions. The partitioning
substitutes for leading columns of indexes, reducing index size and making
it more likely that the heavily-used parts of the indexes fit in memory."

I think it does not apply to hash partitioning in the general case.

--
Alex

Re: When to use PARTITION BY HASH?

Reply via email to