On Fri, May 12, 2017 at 1:35 PM, Joe Conway <m...@joeconway.com> wrote:
>> That's a good point, but the flip side is that, if we don't have
>> such a rule, a pg_dump of a hash-partitioned table on one
>> architecture might fail to restore on another architecture.  Today, I
>> believe that, while the actual database cluster is
>> architecture-dependent, a pg_dump is architecture-independent.  Is it
>> OK to lose that property?
> Not from where I sit.

It was pointed out at PGCon that we've actually already crossed this
Rubicon.  If you create a range-partitioned table, put a bunch of data
into it, and then try to reload it on another system with a different
set of encoding definitions, the proper partition for some row might
be different.  That would result in the dump failing to reload with a
complaint about the partition key being violated.  And, in fact, you
could have the exact same issue on earlier releases which don't have
partitioning, because a CHECK constraint of the form (a >= 'something'
AND b < 'something else') could be true under one encoding and false
under another, and you could define such a constraint on any release
(from this millienium, anyway).

I'm not actually aware of an instance where this has bitten anyone,
even though it seems like it certainly could have and maybe should've
gotten somebody at some point.  Has anyone else?  I think it's a
reasonable guess that such problems will become more common with the
advent of partitioning and more common still as we continue to improve
partitioning, because people who otherwise would have given up on
PostgreSQL -- or at least on partitioning -- will actually try to use
it in earnest and then hit this problem.  However, my guess is that it
will still be pretty rare, and that having an optional
--dump-partition-data-with-parent flag that can be used when it crops
up will be an adequate workaround for most people.  Of course, that is
just my opinion.

So now I think that the right way to think about the issues around
hash partitioning is as a new variant of a problem that we already
have rather than an altogether new problem.  IMHO, the right questions

1. Are the new problems worse than the old ones?

2. What could we do about it?

On #1, I'd say tentatively yes.  The problem of portability across
encodings is probably not new.  Suppose you have a table partitioned
by range, either using the new range partitioning or using the old
table inheritance method and CHECK constraints.  If you move that
table to a different encoding, will the collation behavior you get
under the new encoding match the collation behavior you got under the
old encoding?  The documentation says: "Also, a collation is tied to a
character set encoding (see Section 22.3). The same collation name may
exist for different encodings", which makes it sound like it is
possible but not guaranteed.  Even if the same collation name happens
to exist, there's no guarantee it behaves the same way under the new
encoding, and given our experiences with glibc so far, I'd bet against
it.  If glibc doesn't even think strcoll() and strxfrm() need to agree
with each other for the same collation, or that minor releases
shouldn't whack the behavior around, there doesn't seem to be room for
optimism about the possibility that they carefully preserve behavior
across similarly-named collations on different encodings.  On the
other hand, collation rules probably tend to vary only around the
edges, so there's a reasonably good chance that even if the collation
rules change when you switch encodings, every row will still get put
into the same partition as before.  If we implement hashing for hash
partitioning in some trivial way like hashing the raw bytes, that will
most certainly not be true -- *most* rows will move to a different
partition when you switch encodings.

Furthermore, neither range nor list partitioning depends on properties
of the hardware, like how wide integers are, or whether they are
stored big-endian.  A naive approach to hash partitioning would depend
on those things.  That's clearly worse.

On #2, I'll attempt to list the approaches that have been proposed so far:

1. Don't implement hash partitioning (Tom Lane).  I don't think this
proposal will attract many votes.

2. Add an option like --dump-partition-data-with-parent.  I'm not sure
who originally proposed this, but it seems that everybody likes it.
What we disagree about is the degree to which it's sufficient.  Jeff
Davis thinks it doesn't go far enough: what if you have an old
plain-format dump that you don't want to hand-edit, and the source
database is no longer available?  Most people involved in the
unconference discussion of partitioning at PGCon seemed to feel that
wasn't really something we should be worry about too much.  I had been
taking that position also, more or less because I don't see that there
are better alternatives.  For instance, Jeff proposed having the COPY
command specify both the parent and the child and providing a run-time
option of some kind to decide which table name gets used, but I think
that would be a fairly unpleasant syntax wart with some risk of
breaking other cases (e.g. what if you want to restore a single child
on a system where the parent doesn't exist?).  Someone else in the
room had a different take on why we shouldn't worry about this
problem, which I'll try to summarize: "Well, encoding conversions are
already so painful that if you're laboring under any illusion that
it's all just going to work, you're wrong, and this isn't going to
make anything materially worse."

3. Implement portable hash functions (Jeff Davis or me, not sure
which).  Andres scoffed at this idea, but I still think it might have
legs.  Coming up with a hashing algorithm for integers that produces
the same results on big-endian and little-endian systems seems pretty
feasible, even with the additional constraint that it should still be
fast.   Coming up with a hashing algorithm that produces the same
results on every various encodings of the same characters is
definitely feasible in any case where we know how to do encoding
conversion among all relevant encodings, but it is probably hard to
make fast.  Those two things also solve different parts of the
problem; one is insulating the user from a difference in hardware
architecture, while the other is insulating the user from a difference
in user-selected settings.  I think that the first of those things is
more important than the second, because it's easier to change your
settings than it is to change your hardware.  Also, I think that there
may be room for allowing for some user choice on this point; if people
are willing to write multiple hash opfamilies with different
portability-performance trade-offs, we can offer them all, either in
core or contrib, and users can pick the ones that have the properties
they want.  My personal guess is that most people will prefer the fast
hash functions over the ones that solve their potential future
migration problems, but, hey, options are good.

I think that some combination of #2 and #3 is probably as well as
we're going to do here, and personally I think we can make that pretty
good.  Users who use PostgreSQL 11 to do hash partitioning using a
composite key consisting of one float8 column and one SJIS-encoded
text column on a VAX and attach to each partition a CHECK constraint
that happens to pass only because of the vagaries of which rows end up
in which partitions, and who then try to migrate to  Linux64 with a
numeric column and a UTF-8 encoded text column with the same CHECK
constraints will have some problems to work out.  However, for the
vast majority of people, reloading the data through the top-parent is
going to be good enough, and the more portable we can make the hash
functions, the fewer people will need to do even that much.  Of
course, if there are other things we can do for a reasonable
investment of effort to make things better still, great!  I'm open to
suggestions.  But I don't think we need to panic about this.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to