On Fri, May 12, 2017 at 2:45 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Yeah, that isn't really appetizing at all.  If we were doing physical
> partitioning below the user-visible level, we could make it fly.
> But the existing design makes the partition boundaries user-visible
> which means we have to insist that the partitioning rule is immutable
> (in the strongest sense of being invariant across all installations
> you could possibly wish to transport data between).

I think you're right.

> Now, we already have some issues of that sort with range partitioning;
> if say you try to transport data to another system with a different
> sort ordering, you have probably got problems.  But that issue already
> existed with the old manual partitioning approach, ie if you have a
> CHECK constraint like "(col >= 'foo' && col < 'gob')" then a transfer
> to such a system could fail already.  So I'm okay with that.  But it
> seems like hash partitioning is going to introduce new, exciting, and
> hard-to-document-or-avoid portability gotchas.  Do we really need
> to go there?

That is a good question.  I think it basically amounts to this
question: is hash partitioning useful, and if so, for what?

Range and list partitioning inherently provide management benefits.
For example, if you range-partition your data by year, then when you
want to get rid of the oldest year worth of data, you can drop the
entire partition at once, which is likely to be much faster than a
DELETE statement.  But hash partitioning provide no such benefits
because, by design, the distribution of the data among the partitions
is essentially random.  Dropping one partition will not usually be a
useful thing to do because the rows in that partition are logically
unconnected.  So, if you have a natural way of partitioning a table by
range or list, that's probably going to be superior to hash
partitioning, but sometimes there isn't an obviously useful way of
breaking up the data, but you still want to partition it in order to
reduce the size of the individual tables.

That, in turn, allows maintenance operations to be performed each
partition separately.  For example, suppose your table is big enough
that running CLUSTER or VACUUM FULL on it takes eight hours, but you
can't really afford an eight-hour maintenance window when the table
gets bloated.  However, you can afford (on Sunday nights when activity
is lowest) a window of, say, one hour.  Well, if you hash-partition
the table, you can CLUSTER or VACUUM FULL one partition a week for N
weeks until you get to them all.  Similarly, if you need to create an
index, you can build it on one partition at a time.  You can even add
the index to one partition to see how well it works -- for example,
does it turn too many HOT updates into non-HOT updates, causing bloat?
-- and try it out before you go do it across the board.  And an
unpartitioned table can only accommodate one VACUUM process at a time,
but a partitioned table can have one per partition.  Partitioning a
table also means that you don't need a single disk volume large enough
to hold the whole thing; instead, you can put each partition in a
separate tablespace or (in some exciting future world where PostgreSQL
looks more like a distributed system) perhaps even on another server.

For a table that is a few tens of gigabytes, none of this amounts to a
hill of beans, but for a table that is a few tens of terabytes, the
time it takes to perform administrative operations can become a really
big problem.  A good example is, say, a table full of orders.  Imagine
a high-velocity business like Amazon or UPS that has a constant stream
of new orders, or a mobile app that has a constant stream of new user
profiles.  If that data grows fast enough that the table needs to be
partitioned, how should it be done?  It's often desirable to create
partitions of about equal size and about equal hotness, and
range-partitioning on the creation date or order ID number means
continually creating new partitions, and having all of the hot data in
the same partition.

In my experience, it is *definitely* the case that users with very
large tables are experiencing significant pain right now.  The freeze
map changes were a good step towards ameliorating some of that pain,
but I think hash partitioning is another step in that direction, and I
think there will be other applications as well.  Even for users who
don't have data that large, there are also interesting
query-performance implications of hash partitioning.  Joins between
very large tables tend to get implemented as merge joins, which often
means sorting all the data, which is O(n lg n) and not easy to
parallelize.  But if those very large tables happened to be compatibly
partitioned on the join key, you could do a partitionwise join per the
patch Ashutosh proposed, which would be cheaper and easier to do in
parallel.

Maybe a shorter argument for hash partitioning is that not one but two
different people proposed patches for it within months of the initial
partitioning patch going in.  When multiple people are thinking about
implementing the same feature almost immediately after the
prerequisite patches land, that's a good clue that it's a desirable
feature.  So I think we should try to solve the problems, rather than
giving up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to