On Sat, Jun 6, 2020 at 6:14 PM Michel Pelletier <pelletier.mic...@gmail.com> wrote:
> > Well lets take a step back here and look at the question, hash > partitioning exists in Postgres, is it useful? While I appreciate the need > to see a fact demonstrated, and generally avoiding argument by authority, > it is true that many of the very smartest database people in the world > conceived of, discussed, implemented and documented this feature for us. > It stands to reason that it is useful, or it wouldn't exist. So maybe this > is more about finding or needing better partitioning documentation. > Fair point. I've found the original commit adding this feature in version 11: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1aba8e651ac3e37e1d2d875842de1e0ed22a651e It says: "Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join." It also includes a link to discussion, though that starts in the middle of a long thread. The original thread is here: https://www.postgresql.org/message-id/flat/20170228233313.fc14d8b6.nagata%40sraoss.co.jp However, these threads only argue about implementation details and it's not easy to find a discussion of motivation for this particular partitioning scheme support. I guess it was quite obvious to the participants at that point already. With hash partitioning you are not expected, in general, to end up with a >> small number of partitions being accessed more heavily than the rest. So >> your indexes will also not fit into memory. >> > > Indexes are not (usually) constant time structures, they take more time > the bigger they get. So partitioned indexes will be smaller, quicker to > insert into, and quicker to vacuum, and also gain possible pruning > advantages on query when you split them up. If the planner can, knowing > the key, exclude all but one partition, it won't even look at the other > tables, so if you hash partition by primary key, you reduce the search > space to 1/N immediately. > > Indexes with high update activity also suffer from a problem called "index > bloat" where spares "holes" get punched in the buckets of btree indexes > from updates and delete deletes. These holes are minimized by vacuuming > but the bigger the index gets, the harder that process is to maintain. > Smaller indexes suffer less from index bloat, and remedying the situation > is easier because you can reindex partitions independently of each other. > Your not just reducing the query load to an Nth, you're reducing the > maintenance load. > Thanks for taking your time to explain it in detail. Though I do not tend to believe the insert/scan performance benefit is measurable without trying it, I do see the benefits for maintenance. I have the feeling that using a hash function to distribute rows simply >> contradicts the basic assumption of when you would think of partitioning >> your table at all: that is to make sure the most active part of the table >> and indexes is small enough to be cached in memory. >> > > I think you might be framing this with a specific data pattern in mind, > not all data distributions have a "most active" or power law distribution > of data. > I'm just referring to the first bullet-point in the docs: "Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory." I think it does not apply to hash partitioning in the general case. -- Alex