Re: [HACKERS] Declarative partitioning - another take

Tomas Vondra Sun, 11 Dec 2016 17:03:05 -0800

Hi,

On 12/07/2016 07:20 PM, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 11:53 AM, Erik Rijkers <[email protected]> wrote:
>>> My bad.  The fix I sent last night for one of the cache flush issues
>>> wasn't quite right.  The attached seems to fix it.
>> Yes, fixed here too.  Thanks.
>
> Thanks for the report - that was a good catch.
>
> I've committed 0001 - 0006 with that correction and a few other
> adjustments.  There's plenty of room for improvement here, and almost
> certainly some straight-up bugs too, but I think we're at a point
> where it will be easier and less error-prone to commit follow on
> changes incrementally rather than by continuously re-reviewing a very
> large patch set for increasingly smaller changes.
>

I've been working on a review / testing of the partitioning patch, buthave been unable to submit it before the commit due to a lot of travel.However, at least some of the points seem to be still valid, so let meshare it as an after-commit review. Most of the issues are fairly minor(possibly even nitpicking).



review
------

1) src/include/catalog/pg_partitioned_table.h contains this bit:

 * $PostgreSQL: pgsql/src/include/catalog/pg_partitioned_table.h $

2) I'm wondering whether having 'table' in the catalog name (and also inthe new relkind) is too limiting. I assume we'll have partitionedindexes one day, for example - do we expect to use the same catalogs?


3) A comment within BeginCopy (in copy.c) says:

    * Initialize state for CopyFrom tuple routing.  Watch out for
    * any foreign partitions.

But the code does not seem to be doing that (at least I don't see anyobvious checks for foreign partitions). Also, the comment shouldprobably at least mention why foreign partitions need extra care.

To nitpick, the 'pd' variable in that block seems rather pointless - wecan assign directly to cstate->partition_dispatch_info.

4) I see GetIndexOpClass() got renamed to ResolveOpClass(). I find thenew name rather strange - all other similar functions start with "Get",so I guess "GetOpClass()" would be better. But I wonder if the renamewas even necessary, considering that it still deals with index operatorclasses (although now also in context of partition keys). If the renamereally is needed, isn't that a sign that the function does not belong toindexcmds.c anymore?

5) Half the error messages use 'child table' while the other half uses'partition'. I think we should be using 'partition' as child tablesreally have little meaning outside inheritance (which is kinda hiddenbehind the new partitioning stuff).

6) The psql tab-completion seems a bit broken, because it only offersthe partitions, not the parent table. Which is usually exactly theopposite of what the user wants.



testing
-------

I've also done quite a bit of testing with different partition layouts(single-level list/range partitioning, multi-level partitioning etc.),with fairly large number (~10k) of partitions. The scripts I used areavailable here: https://bitbucket.org/tvondra/partitioning-tests

1) There seems to be an issue when a partition is created and thenaccessed within the same transaction, i.e. for example


BEGIN;
... create parent ...
... create partition ....
... insert into parent ...
COMMIT;

which simply fails with an error like this:

ERROR:  no partition of relation "range_test_single" found for row
DETAIL:  Failing row contains (99000, 99000).

Of course, the partition is there. And interestingly enough, this worksperfectly fine when executed without the explicit transaction, so Iassume it's some sort of cache invalidation mix-up.

2) When doing a SELECT COUNT(*) from the partitioned table, I get a planlike this:


                                  QUERY PLAN
-------------------------------------------------------------------
 Finalize Aggregate  (cost=124523.64..124523.65 rows=1 width=8)
   ->  Gather  (cost=124523.53..124523.64 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=123523.53..123523.54 rows=1 ...)
               ->  Append  (cost=0.00..108823.53 rows=5880001 width=0)
                     ->  Parallel Seq Scan on parent ...
                     ->  Parallel Seq Scan on partition_1 ...
                     ->  Parallel Seq Scan on partition_2 ...
                     ->  Parallel Seq Scan on partition_3 ...
                     ->  Parallel Seq Scan on partition_4 ...
                     ->  Parallel Seq Scan on partition_5 ...
                     ->  Parallel Seq Scan on partition_6 ...
                     ... and the rest of the 10k partitions

So if I understand the plan correctly, we first do a parallel scan ofthe parent, then partition_1, partition_2 etc. But AFAIK we scan thetables in Append sequentially, and each partition only has 1000 rowseach, making the parallel execution rather inefficient. Unless we scanthe partitions concurrently.

In any case, as this happens even with plain inheritance, it's probablymore about the parallel query than about the new partitioning patch.


3) The last issue I noticed is that while

    EXPLAIN SELECT * FROM partitioned_table WHERE id = 1292323;

works just fine (it takes fair amount of time to plan with 10kpartitions, but that's expected), this


    EXPLAIN UPDATE partitioned_table SET x = 1 WHERE id = 1292323;

allocates a lot of memory (~8GB on my laptop, before it gets killed byOOM killer). Again, the same thing happens with plain inheritance-basedpartitioning, so it's probably not a bug in the partitioning patch.

I'm mentioning it here because I think the new partitioning willhopefully get more efficient and handle large partition counts moreefficiently (the inheritance only really works for ~100 partitions,which is probably why no one complained about OOM during UPDATEs). Ofcourse, 10k partitions is a bit extreme (good for testing, though).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

Reply via email to