On Fri, May 19, 2017 at 2:36 AM, Jeff Davis <pg...@j-davis.com> wrote: > I could agree to something like that. Let's explore some of the challenges > there and potential solutions: > > 1. Dump/reload of hash partitioned data. > > Falling back to restore-through-the-root seems like a reasonable answer > here. Moving to a different encoding is not an edge case, but it's not > common either, so a performance penalty seems acceptable. I'm not > immediately sure how we'd implement this in pg_dump/restore, so I'd feel a > little more comfortable if I saw a sketch.
Right, I think this needs some investigation. I can't whip up a sketch on short notice, but I'll see if someone else at EnterpriseDB can work on it unless somebody else wants to take a crack at it. > 2. Having a lot of hash partitions would be cumbersome > > The user would need to create and manage each partition, and try to do > global operations in a sane way. The normal case would probably involve > scripts to do things like add an index to all partitions, or a column. Many > partitions would also just pollute the namespace unless you remember to put > them in a separate schema (yes, it's easy, but most people will still > forget). Some syntax sugar would go a long way here. I agree. Adding a column already cascades to all children, and there's a proposal to make the same thing true for indexes. See discussion beginning at http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0...@postgrespro.ru I haven't had time to review the code posted there yet, but I would like to see something along the lines discussed there committed to v11, and hopefully also something around foreign keys. It should be possible to create an outbound foreign key on a foreign table and have that cascade to all children. Conversely, it should also be possible to create a foreign key referencing a partitioned table provided that the foreign key references the partitioning key, and that there's a unique index on those same columns on every partition. (Referencing a foreign key that is not the partitioning key will have to wait for global indexes, I think.) These things are useful not just for hash partitioning, but also for list and range partitioning, and we'll save a lot of work if we can use the same infrastructure for both cases. > 3. The user would need to specify details they really don't care about for > each partition. > > Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are tedious > boilerplate. And if the user makes a mistake, then 1/16 of inserts start > failing. Probably would be caught during testing, but not exactly a good > user experience. I'm not thrilled about this, considering that all the user > really wants is 16 partitions, but it's not the end of the world. As I said on the hash partitioning thread, I think the way to handle this is to get the basic feature in first, then add some convenience syntax to auto-create the partitions. That shouldn't be very difficult to implement; I just didn't want to complicate things more than necessary for the first version. The same issue arose when discussing list and range partitioning: Oracle has syntax like ours, but with the ability (requirement?) to create the partitions in the initial CREATE TABLE statement. However, I wanted to make sure we had the "inconvenient" syntax fully working and fully tested before we added that, because I believe pg_dump needs to dump everything out the long way. > 4. Detach is a foot-gun > > If you detach a partition, random inserts will start failing. Not thrilled > about this, but a hapless user would accept most of the blame if they > stumble over it. Another way of saying this is with hash partitioning you > really need the whole set for the table to be online at all. But we can't > really enforce that, because it would limit some of the flexibility that you > have in mind. Yes, I agree with all of that. I don't think it's really going to be a problem in practice. The only reason to detach a hash partition is if you want to split it, and we may eventually have convenience syntax to do that in an automated (i.e. less error-prone) way. If somebody does it manually and without a plan for putting back a replacement partition, they may be sad, but if somebody puts a CHECK (false) constraint on their table, they may be sad about that, too. It's more important to allow for flexibility than to prohibit every stupid thing somebody might try to do. Also, documentation helps. We've got a chapter on partitioning and it can be expanded to discuss these kinds of issues. > Stepping back, your approach might be closer to the general postgres > philosophy of allowing the user to assemble from spare parts first, then a > few releases later we offer some pre-built subassemblies, and a few releases > later we make the typical cases work out of the box. I'm fine with it as > long as we don't paint ourselves into a corner. That's basically my thinking here. Right now, our partitioning is primitive in numerous ways, and so the rough edges are pretty visible. However, I believe that with careful design we can file down many of those rough edges over time. Now, it's probably never going to be quite as smooth as if the system had been designed for partitions from the ground up, or at least not any time in the foreseeable future, but I think it can still be very good. If we add the above-referenced logic for index and foreign key handling, convenience syntax for creating partitions along with tables and for data-movement operations, better partition pruning in the planner, run-time partition-pruning in the executor, partitionwise join and aggregate, MergeAppend -> Append strength reduction when the required sortorder matches the partition order, UPDATE tuple routing, default partitions, and so on, I believe that partitioning will go from "eh, that's better than inheritance" to "hey, that's actually really good". There is a ton of work to do there, but I think it is all doable within the current infrastructure, and all of those things can (and in a number of cases already are) being worked on as separate patches, so we can get into each release what is ready and push off to the next release what isn't. As we go, we'll have fewer and fewer cases where partitioning a table regresses performance and more and more cases where the stuff you want to do just works. Probably the toughest nut to crack is global indexes. An argument was made on this very mailing list a number of years ago that nobody should want global indexes because $REASONS, but I was a little skeptical of that argument at the time and it's been clear to me that EnterpriseDB's customers, at least, do not in any way accept those arguments. It works on Oracle or other systems they use, and they find it useful enough that they're unwilling to be without it. If PostgreSQL provides the same functionality, they'll use it for more things than if it doesn't. I find myself more than a bit intimidated at the prospect of actually trying to make this work, but I've been convinced that people won't stop asking until it does. > Of course we still have work to do on the hash functions. We should solve at > least the most glaring portability problems, and try to harmonize the hash > opfamilies. If you agree, I can put together a patch or two. I definitely agree with solving the portability problems to the extent that we can reasonably do so. I think adding more cross-type hash opfamilies is a mildly good thing: I don't object to it, it probably makes sense to do at the same time as any other hashing changes we want to make, and it's better than not doing it. At the same time, I wouldn't walk over hot coals for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers