On 12/12/2016 07:37 AM, Amit Langote wrote:
On 2016/12/12 10:02, Tomas Vondra wrote:
2) I'm wondering whether having 'table' in the catalog name (and also in
the new relkind) is too limiting. I assume we'll have partitioned indexes
one day, for example - do we expect to use the same catalogs?
I am not sure I understand your idea of partitioned indexes, but I doubt
it would require entries in the catalog under consideration. Could you
perhaps elaborate more?
OK, let me elaborate. Let's say we have a partitioned table, and I want
to create an index. The index may be either "global" i.e. creating a
single relation for data from all the partitions, or "local" (i.e.
partitioned the same way as the table).
Local indexes are easier to implement (it's essentially what we have
now, except that we need to create the indexes manually for each
partition), and don't work particularly well for some use cases (e.g.
unique constraints). This is what I mean by "partitioned indexes".
If the index is partitioned just like the table, we probably won't need
to copy the partition key info (so, nothing in pg_partitioned_table).
I'm not sure it makes sense to partition the index differently than the
table - I don't see a case where that would be useful.
The global indexes would work better for the unique constraint use case,
but it clearly contradicts our idea of TID (no information about which
partition that references).
So maybe the catalog really only needs to track info about tables? Not
sure. I'm just saying it'd be unfortunate to have _table in the name,
and end up using it for indexes too.
4) I see GetIndexOpClass() got renamed to ResolveOpClass(). I find the new
name rather strange - all other similar functions start with "Get", so I
guess "GetOpClass()" would be better. But I wonder if the rename was even
necessary, considering that it still deals with index operator classes
(although now also in context of partition keys). If the rename really is
needed, isn't that a sign that the function does not belong to indexcmds.c
The fact that both index keys and partition keys have very similar
specification syntax means there would be some common code handling the
same, of which this is one (maybe, only) example. I didn't see much point
in finding a new place for the function, although I can see why it may be
a bit confusing to future readers of the code.
About the new name - seeing the top line in the old header comment, what
the function does is resolve a possibly explicitly specified operator
class name to an operator class OID. I hadn't changed the name in my
original patch, but Robert suggested the rename, which I thought made sense.
OK, I don't particularly like the name but I can live with that.
5) Half the error messages use 'child table' while the other half uses
'partition'. I think we should be using 'partition' as child tables really
have little meaning outside inheritance (which is kinda hidden behind the
new partitioning stuff).
One way to go about it may be to check all sites that can possibly report
an error involving child tables (aka "partitions") whether they know from
the context which name to use. I think it's possible, because we have
access to the parent relation in all such sites and looking at the relkind
can tell whether to call child tables "partitions".
Clearly, this is a consequence of building the partitioning on top of
inheritance (not objecting to that approach, merely stating a fact).
I'm fine with whatever makes the error messages more consistent, if it
does not make the code significantly more complex. It's a bit confusing
when some use 'child tables' and others 'partitions'. I suspect even a
single DML command may return a mix of those, depending on where exactly
it fails (old vs. new code).
6) The psql tab-completion seems a bit broken, because it only offers the
partitions, not the parent table. Which is usually exactly the opposite of
what the user wants.
Actually, I have not implemented any support for tab-completion for the
new DDL. I will work on that.
Aha! So the problem probably is that psql does not recognize the new
'partitioned table' relkind, and only looks at regular tables.
2) When doing a SELECT COUNT(*) from the partitioned table, I get a plan
Finalize Aggregate (cost=124523.64..124523.65 rows=1 width=8)
-> Gather (cost=124523.53..124523.64 rows=1 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=123523.53..123523.54 rows=1 ...)
-> Append (cost=0.00..108823.53 rows=5880001 width=0)
-> Parallel Seq Scan on parent ...
-> Parallel Seq Scan on partition_1 ...
-> Parallel Seq Scan on partition_2 ...
-> Parallel Seq Scan on partition_3 ...
-> Parallel Seq Scan on partition_4 ...
-> Parallel Seq Scan on partition_5 ...
-> Parallel Seq Scan on partition_6 ...
... and the rest of the 10k partitions
So if I understand the plan correctly, we first do a parallel scan of the
parent, then partition_1, partition_2 etc. But AFAIK we scan the tables in
Append sequentially, and each partition only has 1000 rows each, making
the parallel execution rather inefficient. Unless we scan the partitions
In any case, as this happens even with plain inheritance, it's probably
more about the parallel query than about the new partitioning patch.
Yes, I have seen some discussion  about a Parallel Append, which would
result in plans you're probably thinking of.
Actually, I think that thread is about allowing partial paths even if
only some appendrel members support it. My point is that in this case
it's a bit silly to even build the partial paths, when the partitions
only have 1000 rows each.
I kinda suspect we only look at the total appendrel rowcount estimate,
but haven't checked.
3) The last issue I noticed is that while
EXPLAIN SELECT * FROM partitioned_table WHERE id = 1292323;
works just fine (it takes fair amount of time to plan with 10k partitions,
but that's expected), this
EXPLAIN UPDATE partitioned_table SET x = 1 WHERE id = 1292323;
allocates a lot of memory (~8GB on my laptop, before it gets killed by OOM
killer). Again, the same thing happens with plain inheritance-based
partitioning, so it's probably not a bug in the partitioning patch.
I'm mentioning it here because I think the new partitioning will hopefully
get more efficient and handle large partition counts more efficiently (the
inheritance only really works for ~100 partitions, which is probably why
no one complained about OOM during UPDATEs). Of course, 10k partitions is
a bit extreme (good for testing, though).
Plans with the inheritance parents as target tables (update/delete)
go through inheritance_planner() in the optimizer, which currently
has a design that is based on certain assumptions about traditional
inheritance. It's possible hopefully to make it less expensive for
the partitioned tables.
Yes, I know. I wasn't really reporting it as a bug in the partitioning
patch, but more as a rather surprising difference between plain SELECT
Am I right that one of the ambitions of the new partitioning is to
improve behavior with large number of partitions?
At first I thought it's somewhat related to the FDW sharding (each node
being a partition and having local subpartitions), but I realize the
planner will only deal with the node partitions I guess.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: