On Fri, May 12, 2017 at 1:35 PM, Joe Conway <m...@joeconway.com> wrote: >> That's a good point, but the flip side is that, if we don't have >> such a rule, a pg_dump of a hash-partitioned table on one >> architecture might fail to restore on another architecture. Today, I >> believe that, while the actual database cluster is >> architecture-dependent, a pg_dump is architecture-independent. Is it >> OK to lose that property? > > Not from where I sit.
It was pointed out at PGCon that we've actually already crossed this Rubicon. If you create a range-partitioned table, put a bunch of data into it, and then try to reload it on another system with a different set of encoding definitions, the proper partition for some row might be different. That would result in the dump failing to reload with a complaint about the partition key being violated. And, in fact, you could have the exact same issue on earlier releases which don't have partitioning, because a CHECK constraint of the form (a >= 'something' AND b < 'something else') could be true under one encoding and false under another, and you could define such a constraint on any release (from this millienium, anyway). I'm not actually aware of an instance where this has bitten anyone, even though it seems like it certainly could have and maybe should've gotten somebody at some point. Has anyone else? I think it's a reasonable guess that such problems will become more common with the advent of partitioning and more common still as we continue to improve partitioning, because people who otherwise would have given up on PostgreSQL -- or at least on partitioning -- will actually try to use it in earnest and then hit this problem. However, my guess is that it will still be pretty rare, and that having an optional --dump-partition-data-with-parent flag that can be used when it crops up will be an adequate workaround for most people. Of course, that is just my opinion. So now I think that the right way to think about the issues around hash partitioning is as a new variant of a problem that we already have rather than an altogether new problem. IMHO, the right questions are: 1. Are the new problems worse than the old ones? 2. What could we do about it? On #1, I'd say tentatively yes. The problem of portability across encodings is probably not new. Suppose you have a table partitioned by range, either using the new range partitioning or using the old table inheritance method and CHECK constraints. If you move that table to a different encoding, will the collation behavior you get under the new encoding match the collation behavior you got under the old encoding? The documentation says: "Also, a collation is tied to a character set encoding (see Section 22.3). The same collation name may exist for different encodings", which makes it sound like it is possible but not guaranteed. Even if the same collation name happens to exist, there's no guarantee it behaves the same way under the new encoding, and given our experiences with glibc so far, I'd bet against it. If glibc doesn't even think strcoll() and strxfrm() need to agree with each other for the same collation, or that minor releases shouldn't whack the behavior around, there doesn't seem to be room for optimism about the possibility that they carefully preserve behavior across similarly-named collations on different encodings. On the other hand, collation rules probably tend to vary only around the edges, so there's a reasonably good chance that even if the collation rules change when you switch encodings, every row will still get put into the same partition as before. If we implement hashing for hash partitioning in some trivial way like hashing the raw bytes, that will most certainly not be true -- *most* rows will move to a different partition when you switch encodings. Furthermore, neither range nor list partitioning depends on properties of the hardware, like how wide integers are, or whether they are stored big-endian. A naive approach to hash partitioning would depend on those things. That's clearly worse. On #2, I'll attempt to list the approaches that have been proposed so far: 1. Don't implement hash partitioning (Tom Lane). I don't think this proposal will attract many votes. 2. Add an option like --dump-partition-data-with-parent. I'm not sure who originally proposed this, but it seems that everybody likes it. What we disagree about is the degree to which it's sufficient. Jeff Davis thinks it doesn't go far enough: what if you have an old plain-format dump that you don't want to hand-edit, and the source database is no longer available? Most people involved in the unconference discussion of partitioning at PGCon seemed to feel that wasn't really something we should be worry about too much. I had been taking that position also, more or less because I don't see that there are better alternatives. For instance, Jeff proposed having the COPY command specify both the parent and the child and providing a run-time option of some kind to decide which table name gets used, but I think that would be a fairly unpleasant syntax wart with some risk of breaking other cases (e.g. what if you want to restore a single child on a system where the parent doesn't exist?). Someone else in the room had a different take on why we shouldn't worry about this problem, which I'll try to summarize: "Well, encoding conversions are already so painful that if you're laboring under any illusion that it's all just going to work, you're wrong, and this isn't going to make anything materially worse." 3. Implement portable hash functions (Jeff Davis or me, not sure which). Andres scoffed at this idea, but I still think it might have legs. Coming up with a hashing algorithm for integers that produces the same results on big-endian and little-endian systems seems pretty feasible, even with the additional constraint that it should still be fast. Coming up with a hashing algorithm that produces the same results on every various encodings of the same characters is definitely feasible in any case where we know how to do encoding conversion among all relevant encodings, but it is probably hard to make fast. Those two things also solve different parts of the problem; one is insulating the user from a difference in hardware architecture, while the other is insulating the user from a difference in user-selected settings. I think that the first of those things is more important than the second, because it's easier to change your settings than it is to change your hardware. Also, I think that there may be room for allowing for some user choice on this point; if people are willing to write multiple hash opfamilies with different portability-performance trade-offs, we can offer them all, either in core or contrib, and users can pick the ones that have the properties they want. My personal guess is that most people will prefer the fast hash functions over the ones that solve their potential future migration problems, but, hey, options are good. I think that some combination of #2 and #3 is probably as well as we're going to do here, and personally I think we can make that pretty good. Users who use PostgreSQL 11 to do hash partitioning using a composite key consisting of one float8 column and one SJIS-encoded text column on a VAX and attach to each partition a CHECK constraint that happens to pass only because of the vagaries of which rows end up in which partitions, and who then try to migrate to Linux64 with a numeric column and a UTF-8 encoded text column with the same CHECK constraints will have some problems to work out. However, for the vast majority of people, reloading the data through the top-parent is going to be good enough, and the more portable we can make the hash functions, the fewer people will need to do even that much. Of course, if there are other things we can do for a reasonable investment of effort to make things better still, great! I'm open to suggestions. But I don't think we need to panic about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers