On 12/07/2016 07:20 PM, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 11:53 AM, Erik Rijkers <e...@xs4all.nl> wrote:
>>> My bad.  The fix I sent last night for one of the cache flush issues
>>> wasn't quite right.  The attached seems to fix it.
>> Yes, fixed here too.  Thanks.
> Thanks for the report - that was a good catch.
> I've committed 0001 - 0006 with that correction and a few other
> adjustments.  There's plenty of room for improvement here, and almost
> certainly some straight-up bugs too, but I think we're at a point
> where it will be easier and less error-prone to commit follow on
> changes incrementally rather than by continuously re-reviewing a very
> large patch set for increasingly smaller changes.

I've been working on a review / testing of the partitioning patch, but have been unable to submit it before the commit due to a lot of travel. However, at least some of the points seem to be still valid, so let me share it as an after-commit review. Most of the issues are fairly minor (possibly even nitpicking).


1) src/include/catalog/pg_partitioned_table.h contains this bit:

 * $PostgreSQL: pgsql/src/include/catalog/pg_partitioned_table.h $

2) I'm wondering whether having 'table' in the catalog name (and also in the new relkind) is too limiting. I assume we'll have partitioned indexes one day, for example - do we expect to use the same catalogs?

3) A comment within BeginCopy (in copy.c) says:

    * Initialize state for CopyFrom tuple routing.  Watch out for
    * any foreign partitions.

But the code does not seem to be doing that (at least I don't see any obvious checks for foreign partitions). Also, the comment should probably at least mention why foreign partitions need extra care.

To nitpick, the 'pd' variable in that block seems rather pointless - we can assign directly to cstate->partition_dispatch_info.

4) I see GetIndexOpClass() got renamed to ResolveOpClass(). I find the new name rather strange - all other similar functions start with "Get", so I guess "GetOpClass()" would be better. But I wonder if the rename was even necessary, considering that it still deals with index operator classes (although now also in context of partition keys). If the rename really is needed, isn't that a sign that the function does not belong to indexcmds.c anymore?

5) Half the error messages use 'child table' while the other half uses 'partition'. I think we should be using 'partition' as child tables really have little meaning outside inheritance (which is kinda hidden behind the new partitioning stuff).

6) The psql tab-completion seems a bit broken, because it only offers the partitions, not the parent table. Which is usually exactly the opposite of what the user wants.


I've also done quite a bit of testing with different partition layouts (single-level list/range partitioning, multi-level partitioning etc.), with fairly large number (~10k) of partitions. The scripts I used are available here: https://bitbucket.org/tvondra/partitioning-tests

1) There seems to be an issue when a partition is created and then accessed within the same transaction, i.e. for example

... create parent ...
... create partition ....
... insert into parent ...

which simply fails with an error like this:

ERROR:  no partition of relation "range_test_single" found for row
DETAIL:  Failing row contains (99000, 99000).

Of course, the partition is there. And interestingly enough, this works perfectly fine when executed without the explicit transaction, so I assume it's some sort of cache invalidation mix-up.

2) When doing a SELECT COUNT(*) from the partitioned table, I get a plan like this:

                                  QUERY PLAN
 Finalize Aggregate  (cost=124523.64..124523.65 rows=1 width=8)
   ->  Gather  (cost=124523.53..124523.64 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=123523.53..123523.54 rows=1 ...)
               ->  Append  (cost=0.00..108823.53 rows=5880001 width=0)
                     ->  Parallel Seq Scan on parent ...
                     ->  Parallel Seq Scan on partition_1 ...
                     ->  Parallel Seq Scan on partition_2 ...
                     ->  Parallel Seq Scan on partition_3 ...
                     ->  Parallel Seq Scan on partition_4 ...
                     ->  Parallel Seq Scan on partition_5 ...
                     ->  Parallel Seq Scan on partition_6 ...
                     ... and the rest of the 10k partitions

So if I understand the plan correctly, we first do a parallel scan of the parent, then partition_1, partition_2 etc. But AFAIK we scan the tables in Append sequentially, and each partition only has 1000 rows each, making the parallel execution rather inefficient. Unless we scan the partitions concurrently.

In any case, as this happens even with plain inheritance, it's probably more about the parallel query than about the new partitioning patch.

3) The last issue I noticed is that while

    EXPLAIN SELECT * FROM partitioned_table WHERE id = 1292323;

works just fine (it takes fair amount of time to plan with 10k partitions, but that's expected), this

    EXPLAIN UPDATE partitioned_table SET x = 1 WHERE id = 1292323;

allocates a lot of memory (~8GB on my laptop, before it gets killed by OOM killer). Again, the same thing happens with plain inheritance-based partitioning, so it's probably not a bug in the partitioning patch.

I'm mentioning it here because I think the new partitioning will hopefully get more efficient and handle large partition counts more efficiently (the inheritance only really works for ~100 partitions, which is probably why no one complained about OOM during UPDATEs). Of course, 10k partitions is a bit extreme (good for testing, though).


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to