On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: >> Congrats to everyone working on this! This is a large step forward. > > Congratulations to all! It was a long way to this result.
Yes. The last effort in this area which I can remember was by Itagaki Takahiro in 2010, so we've been waiting for this for more than 6 years. It's really good that Amit was able to put in the effort to produce a committable patch, and I think he deserves all of our thanks for getting that done - and NTT deserves our thanks for paying him to do it. Even though I know he put in a lot more work than I did, let me just say: phew, even reviewing that was a ton of work. Of course, this is the beginning, not the end. I've been thinking about next steps -- here's an expanded list: - more efficient plan-time partition pruning (constraint exclusion is too slow) - run-time partition pruning - partition-wise join (Ashutosh Bapat is already working on this) - try to reduce lock levels - hash partitioning - the ability to create an index on the parent and have all of the children inherit it; this should work something like constraint inheritance. you could argue that this doesn't add any real new capability but it's a huge usability feature. - teaching autovacuum enough about inheritance hierarchies for it to update the parent statistics when they get stale despite the lack of any actual inserts/updates/deletes to the parent. this has been pending for a long time, but it's only going to get more important - row movement (aka avoiding the need for an ON UPDATE trigger on each partition) - insert (and eventually update) tuple routing for foreign partitions - not scanning the parent - fixing the insert routing so that we can skip tuple conversion where possible - fleshing out the documentation One thing I'm wondering is whether we can optimize away some of the heavyweight locks. For example, if somebody does SELECT * FROM ptab WHERE id = 1, they really shouldn't need to lock the entire partitioning hierarchy, but right now they do. If the root knows based on its own partitioning key that only one child is relevant, it would be good to lock *only that child*. For this feature to be competitive, it needs to scale to at least a few thousand partitions, and locking thousands of objects instead of one or two is bound to be slow. Similarly, you can imagine teaching COPY to lock partitions only on demand; if no tuples are routed to a particular partition, we don't need to lock it. There's a manageability component here, too: not locking partitions unnecessarily makes ti easier to get DDL on other partitions through. Alternatively, maybe we could rewrite the lock manager to be hierarchical, so that you can take a single lock that represents an AccessShareLock on all partitions and only need to make one entry in the lock table to do it. That means that attempts to lock individual partitions need to check not only for a lock on that partition but also on anything further up in the hierarchy, but that might be a good trade if it gives us O(1) locking on the parent. And maybe we could also have a level of the hierarchy that represents every-table-in-the-database, for the benefit of pg_dump. Of course, rewriting the lock manager is a big project not for the faint of heart, but I think if we don't it's going to be a scaling bottleneck. We also need to consider other parts of the system that may not scale, like pg_dump. For a long time, we've been sorta-kinda willing to fix the worst of the scalability problems with pg_dump, but that's really no longer an adequate response. People want 1000 partitions. Heck, people want 1,000,000 partitions, but getting to where 1000 partitions works well would help PostgreSQL a lot. Our oft-repeated line that inheritance isn't designed for large numbers of inheritance children is basically just telling people who have the use case where they need that to go use some other product. Partitioning, like replication, is not an optional feature for a world-class database. And, from a technical point of view, I think we've now got an infrastructure that really should be able to be scaled up considerably higher than what we've been able to do in the past. When we were stuck with inheritance + constraint exclusion, we could say "well, there's not really any point because you'll hit these other limits anyway". But I think now that's not really true. This patch eliminates one of the core scalability problems in this area, and provides infrastructure for attacking some of the others. I hope that people will step up and do that. There's a huge opportunity here for PostgreSQL to become relevant in use cases where it currently falters badly, and we should try to take advantage of it. This patch is a big step by itself, but if we ignore the potential to do more with this as the base we will be leaving a lot of "win" on the table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers