Re: [HACKERS] Aggregates push-down to partitions
Hi Konstantin! 09.11.17 20:14, Konstantin Knizhnik wrote: It is still far from ideal plan because each worker is working with all partitions, instead of spitting partitions between workers and calculate partial aggregates for each partition. But if we add FDW as a child of parent table, then parallel scan can not be used and we get the worst possible plan: postgres=# create foreign table derived_fdw() inherits(base) server pg_fdw options (table_name 'derived1');CREATE FOREIGN TABLE postgres=# explain select sum(x) from base; QUERY PLAN -- Aggregate (cost=34055.07..34055.08 rows=1 width=8) -> Append (cost=0.00..29047.75 rows=2002926 width=4) -> Seq Scan on base (cost=0.00..0.00 rows=1 width=4) -> Seq Scan on derived1 (cost=0.00..14425.00 rows=100 width=4) -> Seq Scan on derived2 (cost=0.00..14425.00 rows=100 width=4) -> Foreign Scan on derived_fdw (cost=100.00..197.75 rows=2925 width=4) (6 rows) So we sequentially pull all data to this node and compute aggregates locally. Ideal plan will calculate in parallel partial aggregates at all nodes and then combine partial results. It requires two changes: 1. Replace Aggregate->Append with Finalize_Aggregate->Append->Partial_Aggregate 2. Concurrent execution of Append. It also can be done in two different ways: we can try to use existed parallel workers infrastructure and replace Append with Gather. It seems to be the best approach for local partitioning. In case of remote (FDW) partitions, it is enough to split starting of execution (PQsendQuery in postgres_fdw) and getting results. So it requires some changes in FDW protocol. I wonder if somebody already investigate this problem or working in this direction. May be there are already some patches proposed? I have searched hackers archive, but didn't find something relevant... Are there any suggestions about the best approach to implement this feature? Maybe in this thread[1] your described problem are solved through introducing Parallel Append node? 1. https://www.postgresql.org/message-id/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1%2BS%2BvRuUQ%40mail.gmail.com -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
07.10.17 16:34, Robert Haas wrote: On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera wrote: One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. AFAICS, the main problem with naming is generating new unique names for subordinate indexes on the stage of migrating data scheme (pg_dump, pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX partitioned_index' statement therefore we have to regenerate their. In this case I propose to restore index names' hierarchy *bottom-up*, i.e. first of all create indexes for the leaf partitions and then create ones for parents up to root explicitly specifying names. When creating index on parent table we have to check is there exist any index on child table that could be child index (identical criteria). If so, not generate new index but implicitly attach that index into parent one. If we have incomplete index hierarchy, e.g. we dropped some indexes of partitions previously, then recreating of parent's index would regenerate (not attach) indexes for those partitions. We could drop those odd generated indexes after building of parent's index. This decision is not straightforward but provides to consider 'CREATE INDEX paritioned_table' statement as a cascade operation. As a result, we can specify name for each concrete index while recreating a whole hierarchy. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
Hi! On 06.10.2017 19:37, Alvaro Herrera wrote: As you propose, IMO this new feature would use the standard index creation syntax: CREATE [UNIQUE] INDEX someindex ON parted_table (a, b); This command currently throws an error. We'd have it do two things: 1. create catalog rows in pg_class and pg_index for the main table, indicating the existance of this partitioned index. These would not point to an actual index (since the main table itself is empty), but instead to an "abstract" index. This abstract index can be used by various operations; see below. Robert Haas proposed[1] to use the name "partitioned index" (instead of abstract) that have to be reflected in 'relkind' field of pg_class as the RELKIND_PARTITIONED_INDEX value. I propose that an index declared UNIQUE throws an error for now. We can implement uniqueness (for the case where the indexed columns match the partitioning key) later, once we sort out all the issues here first. I think unique indexes would be very useful even with that limitation, but let's have it as a separate project. Yes, global uniqueness through local unique indexes causes further work related with foreign keys on partitioned table, full support of INSERT OF CONFLICT, etc. It make sense to implement after the current stage of work. I think using pg_depend as the mechanism to link partition indexes to parent is a bad idea. How about pg_inherits instead? Seems more appropriate. Greg Stark proposed[2] to use new pg_index field of oid type that refers to the parent pg_index item. AFAIC pg_inherits also makes sense but semantically it deals with inheriting tables. IMHO the using of this catalog table to define relation between partitioned table and partitions looks like a hack to make use of constraint exclusion logic for partition pruning. Creating hierachy-wide indexes for existing partitioned tables is likely to take a long time, so we must include CONCURRENTLY as an option. This will need some transaction machinery support in order to ensure that each partition gets its index created at some point in a long chain of transactions, and that the whole thing is marked valid only at the end. Also, if the creation is interrupted (because of a crash or a regular shutdown), it'll be useful to be able to continue rather than being forced to start from scratch. This option was very difficult for me. I would be interested to see the implementation. During ALTER TABLE ... ATTACH PARTITION, we check that an indexing satisfying the abstract index exist (and we create a pg_inherit link). If not, the command is aborted. We could create necessary index for partition, not abort ALTER TABLE ... ATTACH PARTITION statement. We need to come up with some way to generate names for each partition index. I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), namespaceId, indexColNames, ...)' resolves this problem. I am going to work on this now. It will be great! I think this project is difficult for me on the part of integration described above functionality with the legacy postgres code. Also IMO this project is very important because it opens the way for such feature as global uniqueness of fields of partitioned tables. And any protraction in implementation is bad. I would like to review and test your intermediate results. 1. https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com -- Regards, Maksim Milyutin
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
Hi HORIGUCHI-san! Thanks for the valuable comments. 03.10.17 4:30, Kyotaro HORIGUCHI wrote: The first thought that patch gave me is that the problem is not limited to constants. Actually the following sequence also reproduces similar failure even with this patch. create table t2 (x int , y int); create type pair as (x int, y int); prepare test as select row(x, y)::pair from t2; drop type pair; execute test; | ERROR: cache lookup failed for type 16410 In this case the causal expression is in the following form. TargetEntry ( expr = ( RowExpr: typeid = 16410, row_format = COERCE_EXPLICIT_CAST, args = List (Var(t2.x), Var(t2.y)) ) ) Yeah, RowExpr has no dependency from type relation with oid=16410. I think the routine 'fix_expr_common' needs to be reworked to take into account any possible dependencies of expression from composite type. On November commitfest I'll lay out patch that covers your case. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
Hi, Alexander! Thanks for the comments. 02.10.17 20:02, Alexander Korotkov wrote: Please, register this patch at upcoming commitfest to ensure it wouldn't be forgotten. Regression test changes (both .sql and .out) are essential parts of the patch. I see no point in posting them separately. Please, incorporate them into your patch. OK, I'll take your advice. Did you check this patch with manually created composite type (made by CREATE TYPE typname AS ...)? I have tested the following case: create type pair as (x int, y int); prepare test as select json_populate_record(null::pair, '{"x": 1, "y": 2}'::json); drop type pair cascade; execute test; -- The following output is obtained before patch ERROR: cache lookup failed for type 16419 -- After applying patch ERROR: type "pair" does not exist But after recreating 'pair' type I'll get the following message: ERROR: cached plan must not change result type I don't know whether it's right behavior. Anyhow your point is a good motivation to experiment and investigate different scenarios of work with cached plan that depends on non-stable type. Thanks for that. -- Regards, Maksim Milyutin
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
On 26.09.2017 23:25, Maksim Milyutin wrote: 25.09.17 20:50, Maksim Milyutin wrote: I have found out the problem when try to sequentially call the function that casts constant to composite type of temporary table that is deleted ateach transaction termination (i.e. at each function call completion). For example, we have the following function 'test': CREATE OR REPLACE FUNCTION test() RETURNS void LANGUAGE plpgsql AS $function$ begin create temp table tbl (id int) on commit drop; perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt; end; $function$ Оn the second and subsequent calls we'll observe the following error: ERROR: cache lookup failed for type 16392 I investigated the problem and realized that result type of function *json_populate_record* (/funcresulttype/ field of /FuncExpr/ struct) as well as type of the first null argument (/consttype/ field of /Const/ struct) refer to the invalid composite type related with temporary table'tbl'. Namely they take a value of oid gotten from the first 'tbl' initialization. The plan of query *'perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt'*is cached and is not invalidated at each function call. This is because** the statement of this query doesn't have any dependency from the 'tbl' relation (/relationOids/ list of /CachedPlanSource/ struct). Attached patch resolves this problem by adding dependency from relation upon detection Const expression of composite type of that relation Updated patchset contains more transparent definition of composite type for constant node and regression test for described above buggy case. Is there any interest on the problem in this thread? -- Regards, Maksim Milyutin
Re: [HACKERS] Add support for tuple routing to foreign partitions
Hi Fujita-san! On 11.09.2017 16:01, Etsuro Fujita wrote: Here is an updated version of the patch. * Query planning: the patch creates copies of Query/Plan with a foreign partition as target from the original Query/Plan for each foreign partition and invokes PlanForeignModify with those copies, to allow the FDW to do query planning for remote INSERT with the existing API. To make such Queries the similar way inheritance_planner does, I modified transformInsertStmt so that the inh flag for the partitioned table's RTE is set to true, which allows (1) expand_inherited_rtentry to build an RTE and AppendRelInfo for each partition in the partitioned table and (2) make_modifytable to build such Queries using adjust_appendrel_attrs and those AppendRelInfos. * explain.c: I modified show_modifytable_info so that we can show remote queries for foreign partitions in EXPLAIN for INSERT into a partitioned table the same way as for inherited UPDATE/DELETE cases. Here is an example: postgres=# explain verbose insert into pt values (1), (2); QUERY PLAN --- Insert on public.pt (cost=0.00..0.03 rows=2 width=4) Foreign Insert on public.fp1 Remote SQL: INSERT INTO public.t1(a) VALUES ($1) Foreign Insert on public.fp2 Remote SQL: INSERT INTO public.t2(a) VALUES ($1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) Output: "*VALUES*".column1 (7 rows) I think I should add more explanation about the patch, but I don't have time today, so I'll write additional explanation in the next email. Sorry about that. Could you update your patch, it isn't applied on the actual state of master. Namely conflict in src/backend/executor/execMain.c -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitions: \d vs \d+
On 29.09.2017 04:33, Amit Langote wrote: So, we should be looking at partconstraintdef only when verbose is true, because that's only when we set it to a valid value. Now, if partconstraintdef is NULL even after verbose is true, that means backend returned that there exists no constraint for that partition, which I thought would be true for a default partition (because the commit that introduced default partitions also introduced "No partition constraint"), but it's really not. For example, \d and \d+ show contradictory outputs for a default partition. create table p (a int) partition by list (a); create table p1 partition of p for values in (1); create table pd partition of p default; \d pd Table "public.pd" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | Partition of: p DEFAULT No partition constraint \d+ pd Table "public.pd" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- a | integer | | | | plain | | Partition of: p DEFAULT Partition constraint: (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[1] Perhaps, there is no case when "No partition constraint" should be output, but I may be missing something. Anyhow, we have to protect ourselves from empty output from *pg_get_partition_constraintdef*. And printing *No partition constraint* would be good point to start to examine why we didn't get any constraint definition. -- Regards, Maksim Milyutin
Re: [HACKERS] Partitions: \d vs \d+
On 28.09.2017 16:29, Jesper Pedersen wrote: On 09/28/2017 09:19 AM, Maksim Milyutin wrote: E.g. "No partition constraint" vs. "Partition constraint: satisfies_hash_partition(...)". I also noticed ambiguity in printing "No partition constraint" in non-verbose mode and "Partition constraint:..." in verbose one for partition tables regardless of the type of partition. Attached small patch removes any output about partition constraint in non-verbose mode. Yeah, that could be one way. It should likely be backported to REL_10_STABLE, so the question is if we are too late in the release cycle to change that output. I want to prepare more complete patch for "Partition constraint" output. For example, I encountered the primitive output with repetitive conjuncts for subpartition whose parent is partitioned by the same key: Partition constraint: (/(i IS NOT NULL)/ AND (i >= 30) AND (i < 40) AND /(i IS NOT NULL)/ AND (i = ANY (ARRAY[30, 31]))) -- Regards, Maksim Milyutin
Re: [HACKERS] Partitions: \d vs \d+
Hi! On 28.09.2017 16:02, Jesper Pedersen wrote: Hi, Using hash partitions I noticed that \d gives D=# \d T_p63 Table "public.T_p63" Column | Type | Collation | Nullable | Default ---+---+---+--+- Partition of: T FOR VALUES WITH (modulus 64, remainder 63) No partition constraint Indexes: "T_p63" btree (X, Y) where as \d+ gives D=# \d+ T_p63 Table "public.T_p63" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ---+---+---+--+-+--+--+- Partition of: T FOR VALUES WITH (modulus 64, remainder 63) Partition constraint: satisfies_hash_partition(64, 63, hashint4extended(X, '8816678312871386367'::bigint)) Indexes: "T_p63" btree (X, Y) E.g. "No partition constraint" vs. "Partition constraint: satisfies_hash_partition(...)". I also noticed ambiguity in printing "No partition constraint" in non-verbose mode and "Partition constraint:..." in verbose one for partition tables regardless of the type of partition. Attached small patch removes any output about partition constraint in non-verbose mode. -- Regards, Maksim Milyutin diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index d22ec68..b301219 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -1900,13 +1900,16 @@ describeOneTableDetails(const char *schemaname, partdef); printTableAddFooter(&cont, tmpbuf.data); - /* If there isn't any constraint, show that explicitly */ - if (partconstraintdef == NULL || partconstraintdef[0] == '\0') -printfPQExpBuffer(&tmpbuf, _("No partition constraint")); - else -printfPQExpBuffer(&tmpbuf, _("Partition constraint: %s"), - partconstraintdef); - printTableAddFooter(&cont, tmpbuf.data); + if (verbose) + { +/* If there isn't any constraint, show that explicitly */ +if (partconstraintdef == NULL || partconstraintdef[0] == '\0') + printfPQExpBuffer(&tmpbuf, _("No partition constraint")); +else + printfPQExpBuffer(&tmpbuf, _("Partition constraint: %s"), + partconstraintdef); +printTableAddFooter(&cont, tmpbuf.data); + } PQclear(result); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
25.09.17 20:50, Maksim Milyutin wrote: I have found out the problem when try to sequentially call the function that casts constant to composite type of temporary table that is deleted ateach transaction termination (i.e. at each function call completion). For example, we have the following function 'test': CREATE OR REPLACE FUNCTION test() RETURNS void LANGUAGE plpgsql AS $function$ begin create temp table tbl (id int) on commit drop; perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt; end; $function$ Оn the second and subsequent calls we'll observe the following error: ERROR: cache lookup failed for type 16392 I investigated the problem and realized that result type of function *json_populate_record* (/funcresulttype/ field of /FuncExpr/ struct) as well as type of the first null argument (/consttype/ field of /Const/ struct) refer to the invalid composite type related with temporary table'tbl'. Namely they take a value of oid gotten from the first 'tbl' initialization. The plan of query *'perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt'*is cached and is not invalidated at each function call. This is because** the statement of this query doesn't have any dependency from the 'tbl' relation (/relationOids/ list of /CachedPlanSource/ struct). Attached patch resolves this problem by adding dependency from relation upon detection Const expression of composite type of that relation Updated patchset contains more transparent definition of composite type for constant node and regression test for described above buggy case. -- Regards, Maksim Milyutin diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b0c9e94459..482b0d227c 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -78,6 +78,12 @@ typedef struct (((con)->consttype == REGCLASSOID || (con)->consttype == OIDOID) && \ !(con)->constisnull) +/* + * Check if a Const node has composite type + */ +#define ISTABLETYPECONST(con) \ + (get_typtype((con)->consttype) == TYPTYPE_COMPOSITE) + #define fix_scan_list(root, lst, rtoffset) \ ((List *) fix_scan_expr(root, (Node *) (lst), rtoffset)) @@ -1410,6 +1416,12 @@ fix_expr_common(PlannerInfo *root, Node *node) root->glob->relationOids = lappend_oid(root->glob->relationOids, DatumGetObjectId(con->constvalue)); + + /* Check whether const has composite type */ + if (ISTABLETYPECONST(con)) + root->glob->relationOids = + lappend_oid(root->glob->relationOids, + get_typ_typrelid(con->consttype)); } else if (IsA(node, GroupingFunc)) { diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out index c2eeff1614..5c0be47778 100644 --- a/src/test/regress/expected/plancache.out +++ b/src/test/regress/expected/plancache.out @@ -220,6 +220,28 @@ execute p2; 1 (1 row) +-- Check that invalidation deals with casting const value to temporary +-- composite type reinitialized on each new transaction +create function cache_query_with_composite_const() returns void as $$ +begin +create temp table tbl(id int) on commit drop; + +-- Plan of the next query has to be rebuilt on each new call of function +-- due to casting first argument 'null' to recreated temprary table 'tbl' +perform json_populate_record(null::tbl, '{"id": 0}'::json); +end$$ language plpgsql; +select cache_query_with_composite_const(); + cache_query_with_composite_const +-- + +(1 row) + +select cache_query_with_composite_const(); + cache_query_with_composite_const +-- + +(1 row) + -- Check DDL via SPI, immediately followed by SPI plan re-use -- (bug in original coding) create function cachebug() returns void as $$ diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql index cb2a551487..41be0d6bf4 100644 --- a/src/test/regress/sql/plancache.sql +++ b/src/test/regress/sql/plancache.sql @@ -140,6 +140,21 @@ create temp sequence seq; execute p2; +-- Check that invalidation deals with casting const value to temporary +-- composite type reinitialized on each new transaction + +create function cache_query_with_composite_const() returns void as $$ +begin +create temp table tbl(id int) on commit drop; + +-- Plan of the next query has to be rebuilt on each new call of function +-- due to casting first argument 'null' to recreated temprary tab
Re: [HACKERS] Repetitive code in RI triggers
On 19.09.2017 11:09, Daniel Gustafsson wrote: Removing reviewer Maksim Milyutin from patch entry due to inactivity and community account email bouncing. Maksim: if you are indeed reviewing this patch, then please update the community account and re-add to the patch entry. cheers ./daniel Daniel, thanks for noticing. I have updated my account and re-added to the patch entry. Ildar, your patch is conflicting with the current HEAD of master branch, please update it. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type
Hello everyone! I have found out the problem when try to sequentially call the function that casts constant to composite type of temporary table that is deleted ateach transaction termination (i.e. at each function call completion). For example, we have the following function 'test': CREATE OR REPLACE FUNCTION test() RETURNS void LANGUAGE plpgsql AS $function$ begin create temp table tbl (id int) on commit drop; perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt; end; $function$ Оn the second and subsequent calls we'll observe the following error: ERROR: cache lookup failed for type 16392 I investigated the problem and realized that result type of function *json_populate_record* (/funcresulttype/ field of /FuncExpr/ struct) as well as type of the first null argument (/consttype/ field of /Const/ struct) refer to the invalid composite type related with temporary table'tbl'. Namely they take a value of oid gotten from the first 'tbl' initialization. The plan of query *'perform json_populate_record(null::tbl, '{ "id": 11 }'::json) as tt'*is cached and is not invalidated at each function call. This is because** the statement of this query doesn't have any dependency from the 'tbl' relation (/relationOids/ list of /CachedPlanSource/ struct). Attached patch resolves this problem by adding dependency from relation upon detection Const expression of composite type of that relation. -- Regards, Maksim Milyutin diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b0c9e94..bf5e4f4 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -78,6 +78,12 @@ typedef struct (((con)->consttype == REGCLASSOID || (con)->consttype == OIDOID) && \ !(con)->constisnull) +/* + * Check if a Const node has composite type + */ +#define ISTABLETYPECONST(con) \ + (OidIsValid(get_typ_typrelid((con)->consttype))) + #define fix_scan_list(root, lst, rtoffset) \ ((List *) fix_scan_expr(root, (Node *) (lst), rtoffset)) @@ -1410,6 +1416,12 @@ fix_expr_common(PlannerInfo *root, Node *node) root->glob->relationOids = lappend_oid(root->glob->relationOids, DatumGetObjectId(con->constvalue)); + + /* Check whether const has composite type */ + if (ISTABLETYPECONST(con)) + root->glob->relationOids = +lappend_oid(root->glob->relationOids, + get_typ_typrelid(con->consttype)); } else if (IsA(node, GroupingFunc)) { diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql index cb2a551..41be0d6 100644 --- a/src/test/regress/sql/plancache.sql +++ b/src/test/regress/sql/plancache.sql @@ -140,6 +140,21 @@ create temp sequence seq; execute p2; +-- Check that invalidation deals with casting const value to temporary +-- composite type reinitialized on each new transaction + +create function cache_query_with_composite_const() returns void as $$ +begin +create temp table tbl(id int) on commit drop; + +-- Plan of the next query has to be rebuilt on each new call of function +-- due to casting first argument 'null' to recreated temprary table 'tbl' +perform json_populate_record(null::tbl, '{"id": 0}'::json); +end$$ language plpgsql; + +select cache_query_with_composite_const(); +select cache_query_with_composite_const(); + -- Check DDL via SPI, immediately followed by SPI plan re-use -- (bug in original coding) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] auto_explain : log queries with wrong estimation
On 24.08.2017 14:56, Adrien Nayrat wrote: Hi hackers, Hi, I try to made a patch to auto_explain in order to log queries with wrong estimation. I compare planned row id : queryDesc->planstate->plan->plan_rows Vs ntuples : queryDesc->planstate->instrument->ntuples; AFAICS you want to introduce two additional per-node variables: - auto_explain_log_estimate_ratio that denotes minimum ratio (>= 1) between real value and planned one. I would add 'min' prefix before 'ratio'. - auto_explain_log_estimate_min_rows - minimum absolute difference between those two values. IMHO this name is somewhat poor, the suffix 'min_diff_rows' looks better. If real expressions (ratio and diff) exceed these threshold values both, you log this situation. I'm right? If I understand, instrumentation is used only with explain. So my patch works only with explain (and segfault without). Instrumentation is initialized only with analyze (log_analyze is true)[1] Is there a simple way to get ntuples? It's interesting question. In one's time I didn't find any way to get the amount of tuples emitted from a node. 1. contrib/auto_explain/auto_explain.c:221 -- Regards, Maksim Milyutin
Re: [HACKERS] Proposal: Local indexes for partitioned table
10.08.17 23:01, Robert Haas wrote: On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin wrote: Ok, thanks for the feedback. Then I'll use a new relkind for local partitioned index in further development. Any update on this? I'll continue to work soon. Sorry for so long delay. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] New command to monitor progression of long running queries
On 19.04.2017 17:13, Remi Colinet wrote: Maksim, 2017-04-18 20:31 GMT+02:00 Maksim Milyutin mailto:m.milyu...@postgrespro.ru>>: On 18.04.2017 17:39, Remi Colinet wrote: Regarding the queryDesc state of SQL query upon receiving a request to report its execution progress, it does not bring any issue. The request is noted when the signal is received by the monitored backend. Then, the backend continues its execution code path. When interrupts are checked in the executor code, the request will be dealt. Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries. When the request is being dealt, the monitored backend will stop its execution and report the progress of the SQL query. Whatever is the status of the SQL query, progress.c code checks the status and report either that the SQL query does not have a valid status, or otherwise the current execution state of the SQL query. SQL query status checking is about: - idle transaction - out of transaction status - null planned statement - utility command - self monitoring Other tests can be added if needed to exclude some SQL query state. Such checking is done in void HandleProgressRequest(void). I do not see why a SQL query progression would not be possible in this context. Even when the queryDescc is NULL, we can just report a output. This is currently the case with the patch suggested. It's interesting question - how much the active query is in a usable state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full consistency [1]. I wonder what you mean about usable state. A usable query state is suitable for analysis, IOW we have consistent QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose he meant the case when a query fails with error and before transaction aborts we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc may be inconsistent and further reading from it will give us invalid result. Currently, the code suggested tests the queryDesc pointer and all the sub nodes pointers in order to detect NULL pointers. When the progress report is collected by the backend, this backend does the collect and consequently does not run the query. So the query tree is not being modified. At this moment, whatever is the query state, we can manage to deal with its static state. It is only a tree which could also be just a NULL pointer in the most extreme case. Such case is dealt in the current code. Perhaps the deep checking of QueryDesc would allow us to consider it as consistent. 1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 19.04.2017 11:42, Ashutosh Bapat wrote: On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin wrote: Local partitioned indexes can be recognized through the check on the relkind of table to which the index refers. Something like this: heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) /* indexid is local index on partitioned table */ An index on partitioned table can be global index (yet to be implemented) or a local index. We can not differentiate between those just by looking at the relation on which they are built. We could to refine the criteria for the local partitioned index later encapsulating it in a macro, e.g., adding a new flag from pg_index that differentiate the type of index on partitioned table. Thеsе cases must be caught. But as much as partitioned tables doesn't participate in query plans their indexes are unaccessible by executor. Reindex operation is overloaded with my patch. A global index would have storage for a partitioned table whereas a local index wouldn't have any storage for a partitioned table. I agree with Amit that we need new relkinds for local as well as global indexes. Ok, thanks for the feedback. Then I'll use a new relkind for local partitioned index in further development. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] New command to monitor progression of long running queries
On 18.04.2017 17:39, Remi Colinet wrote: Hello Maksim, The core implementation I suggested for the new PROGRESS command uses different functions from the one used by EXPLAIN for its core implementation. Some source code is shared with EXPLAIN command. But this shared code is only related to quals, properties, children, subPlans and few other nodes. All other code for PROGRESS is new code. I don't believe explain.c code can be fully shared with the one of the new PROGRESS command. These 2 commands have different purposes. The core code used for the new PROGRESS command is very different from the core code used for EXPLAIN. Perhaps you will be forced to duplicate significant snippets of functionality from explain.c into your progress.c. Regarding the queryDesc state of SQL query upon receiving a request to report its execution progress, it does not bring any issue. The request is noted when the signal is received by the monitored backend. Then, the backend continues its execution code path. When interrupts are checked in the executor code, the request will be dealt. Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries. When the request is being dealt, the monitored backend will stop its execution and report the progress of the SQL query. Whatever is the status of the SQL query, progress.c code checks the status and report either that the SQL query does not have a valid status, or otherwise the current execution state of the SQL query. SQL query status checking is about: - idle transaction - out of transaction status - null planned statement - utility command - self monitoring Other tests can be added if needed to exclude some SQL query state. Such checking is done in void HandleProgressRequest(void). I do not see why a SQL query progression would not be possible in this context. Even when the queryDescc is NULL, we can just report a output. This is currently the case with the patch suggested. It's interesting question - how much the active query is in a usable state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full consistency [1]. So far, I've found this new command very handy. It allows to evaluate the time needed to complete a SQL query. Could you explain how you get the percent of execution for nodes of plan tree and overall for query? A further improvement would be to report the memory, disk and time resources used by the monitored backend. An overuse of memory, disk and time resources can prevent the SQL query to complete. This functionality is somehow implemented in explain.c. You can see my patch to this file [2]. I only manipulate runtime statistics (data in the structure 'Instrumentation') to achieve the printing state of running query. 1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us 2. https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] New command to monitor progression of long running queries
tch is still work in progress. It is meanwhile stable. It can be used with regular queries. Utilities commands are not supported for the moment. Documentation is not yet written. Regards Remi I had implemented analogous feature as extension *pg_query_state* [1] the idea of which I proposed in the thread [2]. Together with this extension I provided some patches to postgres core to enable to send custom signals to working backend (similar to your PROCSIG_PROGRESS) and to print the current query state through patches in 'ExplainNode' function. I had implemented the same mechanics as you: 1) interrupt the working backend through ProcSignal; 2) handle progress request in the CHECK_FOR_INTERRUPTS entry; 3) transfer query state through shared memory to caller. But criticism of my approach was that the structure 'QueryDesc' on basis of which query state is formed can be inconsistent in the places where CHECK_FOR_INTERRUPTS appears [3]. I plan to propose the custom_signal patch to community as soon as possible and as consequence release *pg_query_state* from dependency on patches to postgres core. In perspective, I want to resolve the problem related to binding to the CHECK_FOR_INTERRUPTS entries perhaps through patching the executor and implement the robust PROGRESS command. 1. https://github.com/postgrespro/pg_query_state 2. https://www.postgresql.org/message-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5%40postgrespro.ru 3. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 18.04.2017 13:08, Amit Langote wrote: Hi, Hi, Amit! On 2017/04/17 23:00, Maksim Milyutin wrote: Ok, thanks for the note. But I want to discuss the relevancy of introduction of a new relkind for partitioned index. I could to change the control flow in partitioned index creation (specify conditional statement in the 'index_create' routine in attached patch) and not enter to the 'heap_create' routine. This case releases us from integrating new relkind into different places of Postgres code. But we have to copy-paste some specific code from 'heap_create' function, e.g., definition of relfilenode and tablespaceid for the new index and perhaps something more when 'heap_create' routine will be extended. I may be missing something, but isn't it that a new relkind will be needed anyway? How does the rest of the code distinguish such index objects once they are created? Local partitioned indexes can be recognized through the check on the relkind of table to which the index refers. Something like this: heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) /* indexid is local index on partitioned table */ Is it possible that some other code may try to access the storage for an index whose indrelid is a partitioned table? Thеsе cases must be caught. But as much as partitioned tables doesn't participate in query plans their indexes are unaccessible by executor. Reindex operation is overloaded with my patch. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 10.04.2017 14:20, Robert Haas wrote: On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin wrote: 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l'). Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's nothing particularly "local" about it. I suppose what you're going for is that it's not global, but in a way it *is* global to the partitioning hierarchy. That's the point. It's just that it's partitioned. Ok, thanks for the note. But I want to discuss the relevancy of introduction of a new relkind for partitioned index. I could to change the control flow in partitioned index creation (specify conditional statement in the 'index_create' routine in attached patch) and not enter to the 'heap_create' routine. This case releases us from integrating new relkind into different places of Postgres code. But we have to copy-paste some specific code from 'heap_create' function, e.g., definition of relfilenode and tablespaceid for the new index and perhaps something more when 'heap_create' routine will be extended. What do you think about this way? -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 2328b92..9c15bc9 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -41,6 +41,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_constraint.h" #include "catalog/pg_constraint_fn.h" +#include "catalog/pg_depend.h" #include "catalog/pg_operator.h" #include "catalog/pg_opclass.h" #include "catalog/pg_tablespace.h" @@ -849,17 +850,33 @@ index_create(Relation heapRelation, * we fail further down, it's the smgr's responsibility to remove the disk * file again.) */ - indexRelation = heap_create(indexRelationName, -namespaceId, -tableSpaceId, -indexRelationId, -relFileNode, -indexTupDesc, -RELKIND_INDEX, -relpersistence, -shared_relation, -mapped_relation, -allow_system_table_mods); + if (heapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + indexRelation = heap_create(indexRelationName, + namespaceId, + tableSpaceId, + indexRelationId, + relFileNode, + indexTupDesc, + RELKIND_INDEX, + relpersistence, + shared_relation, + mapped_relation, + allow_system_table_mods); + else + indexRelation = + RelationBuildLocalRelation(indexRelationName, + namespaceId, + indexTupDesc, + indexRelationId, + (OidIsValid(relFileNode)) ? + relFileNode : indexRelationId, + (tableSpaceId == MyDatabaseTableSpace) ? + InvalidOid : tableSpaceId, + shared_relation, + mapped_relation, + relpersistence, + RELKIND_INDEX); + Assert(indexRelationId == RelationGetRelid(indexRelation)); @@ -1549,10 +1566,14 @@ index_drop(Oid indexId, bool concurrent) TransferPredicateLocksToHeapRelation(userIndexRelation); } - /* - * Schedule physical removal of the files - */ - RelationDropStorage(userIndexRelation); + if (userHeapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + { + + /* + * Schedule physical removal of the files + */ + RelationDropStorage(userIndexRelation); + } /* * Close and flush the index's relcache entry, to ensure relcache doesn't @@ -3294,6 +3315,111 @@ IndexGetRelation(Oid indexId, bool missing_ok) } /* + * Find all leaf indexes included into local index with 'indexId' oid and lock + * all dependent indexes and respective relations. + * + * Search is performed in pg_depend table since all indexes belonging to child + * tables depends on index from parent table. + * + * indexId: the oid of local index whose leaf indexes need to find + * result: list of result leaf indexes + * depRel: already opened pg_depend relation + * indexLockmode: lockmode for indexes' locks + * heapLockmode: lockmode for relations' locks + */ +static void +findDepedentLeafIndexes(Oid indexId, List **result, Relation depRel, + LOCKMODE indexLockmode, LOCKMODE heapLockmode) +{ + ScanKeyData key[3]; + int nkeys; + SysScanDesc scan; + HeapTuple tup; + List *localSubIndexIds = NIL; + ListCell *lc; + + ScanKeyInit(&key[0], +Anum_pg_depend_refclassid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(RelationRelationId)); + ScanKeyInit(&key[1], +Anum_pg_depend_refobjid, +BTEqualStrategyNumber, F_OIDEQ, +ObjectIdGetDatum(indexId)); + nkeys = 2; + + scan = systable_beginscan(depRel, DependReferenceIndexId, true, + NULL, nkeys, key); + + while (HeapTupleIsVa
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 10.04.2017 13:46, Greg Stark wrote: On 4 April 2017 at 17:10, Maksim Milyutin wrote: 3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes. Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index. Alternately you could have an single oid in pg_index on each of the children that specifies which local index is its parent. That would probably require a new index on that column so you could look up all the children efficiently. I think it would behave more sensibly when you're adding or removing a partition, especially if you want to add many partitions in parallel using multiple transactions. An oidvector of children would effectively mean you could only be doing one partition creation or deletion at a time. Thanks for your comment. Your approach sounds better than mine. I'll try it. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 07.04.2017 13:05, Etsuro Fujita wrote: On 2016/12/14 16:20, Etsuro Fujita wrote: On 2016/12/09 19:46, Maksim Milyutin wrote: I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; The first one has been implemented in pg_pathman somehow, but the code relies on dirty hacks, so the FDW API has to be improved. As for the extended index support, it doesn't look like a super-hard task. That would be great! I'd like to help review the first one. There seems to be no work on the first one, so I'd like to work on that. Yes, you can start to work on this, I'll join later as a reviewer. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 01.03.2017 13:53, Maksim Milyutin wrote: Hi hackers! As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp I want to present the first version of patches that implement local indexes for partitioned tables and discuss some technical details of that implementation. 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l'). This was done because physical storage is created in the 'heap_create' function and we need to revoke the creating storage as with partitioned tables. Since information that this index belongs to partitioned tables is not available in 'heap_create' function (pg_index entry on the index is not created yet) I chose the least painful way - added a specific relkind for index on partitioned table. I suppose that this act will require the integrating new relkind to different places of source code so I'm ready to consider another proposals on this point. 2. My implementation doesn't support the concurrent creating of local index (CREATE INDEX CONCURRENTLY). As I understand, this operation involves nontrivial manipulation with snapshots and I don't know how to implement concurrent creating of multiple indexes. In this point I ask help from community. 3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes. Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index. On this stage I want to discuss only technical details of local indexes' implementation. The problems related to merging existing indexes of partitions within local index tree, determination uniqueness of field in global sense through local index and syntax notes I want to arise later. CC welcome! -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index cc5ac8b..bec3983 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -154,7 +154,8 @@ index_open(Oid relationId, LOCKMODE lockmode) r = relation_open(relationId, lockmode); - if (r->rd_rel->relkind != RELKIND_INDEX) + if (r->rd_rel->relkind != RELKIND_INDEX && + r->rd_rel->relkind != RELKIND_LOCAL_INDEX) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index fc088b2..26a10e9 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1107,7 +1107,8 @@ doDeletion(const ObjectAddress *object, int flags) { char relKind = get_rel_relkind(object->objectId); -if (relKind == RELKIND_INDEX) +if (relKind == RELKIND_INDEX || + relKind == RELKIND_LOCAL_INDEX) { bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0); diff --git a/src/backend/catalog/heap.c b/src/backend
Re: [HACKERS] Partitioned tables and relfilenode
On 24.03.2017 03:54, Amit Langote wrote: And here is the updated patch. Perhaps you forgot my fix in the updated patch diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 3999e6e..36917c8 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1823,7 +1823,8 @@ heap_drop_with_catalog(Oid relid) */ if (rel->rd_rel->relkind != RELKIND_VIEW && rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE && - rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE) + rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE && + rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) { RelationDropStorage(rel); } -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
Hi! I have noticed that there is scheduled unlinking of nonexistent physical storage under partitioned table when we execute DROP TABLE statement on this partitioned table. Though this action doesn't generate any error under typical behavior of postgres because the error of storage's lack is caught through if-statement [1] I think it is not safe. My patch fixes this issue. 1. src/backend/storage/smgr/md.c:1385 -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 3999e6e..36917c8 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1823,7 +1823,8 @@ heap_drop_with_catalog(Oid relid) */ if (rel->rd_rel->relkind != RELKIND_VIEW && rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE && - rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE) + rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE && + rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) { RelationDropStorage(rel); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 02.03.2017 11:41, Robert Haas wrote: Sounds generally good. One thing to keep in mind is that - in this system - a UNIQUE index on the parent doesn't actually guarantee uniqueness across the whole partitioning hierarchy unless it so happens that the index columns or expressions are the same as the partitioning columns or expressions. That's a little a counterintuitive, and people have already been complaining that a partitioned table + partitions doesn't look enough like a plain table. However, I'm not sure there's a better alternative, because somebody might want partition-wise unique indexes even though that doesn't guarantee global uniqueness. So I think if someday we have global indexes, then we can plan to use some other syntax for that, like CREATE GLOBAL [ UNIQUE ] INDEX. Yes, I absolutely agree with your message that cross-partition uniqueness is guaranteed through global index on partitioned table apart from the case when the index key are the same as partitioning key (or index comprises partitioning key in general). Thanks for your comment. I'll try to propose the first patches as soon as possible. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On 01.03.2017 05:14, Amit Langote wrote: Nagata-san, A partition table can be create as bellow; CREATE TABLE h1 PARTITION OF h; CREATE TABLE h2 PARTITION OF h; CREATE TABLE h3 PARTITION OF h; FOR VALUES clause cannot be used, and the partition bound is calclulated automatically as partition index of single integer value. When trying create partitions more than the number specified by PARTITIONS, it gets an error. postgres=# create table h4 partition of h; ERROR: cannot create hash partition more than 3 for h Instead of having to create each partition individually, wouldn't it be better if the following command CREATE TABLE h (i int) PARTITION BY HASH (i) PARTITIONS 3; created the partitions *automatically*? It's a good idea but in this case we can't create hash-partition that is also partitioned table, and as a consequence we are unable to create subpartitions. My understanding is that the table can be partitioned only using CREATE TABLE statement, not ALTER TABLE. For this reason the new created partitions are only regular tables. We can achieve desired result through creating a separate partitioned table and making the DETACH/ATTACH manipulation, though. But IMO it's not flexible case. It would be a good thing if a regular table could be partitioned through separate command. Then your idea would not be restrictive. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: Local indexes for partitioned table
Hi hackers! As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, everyone! 08.12.16 18:25, Robert Haas wrote: Of course, this is the beginning, not the end. I've been thinking about next steps -- here's an expanded list: - more efficient plan-time partition pruning (constraint exclusion is too slow) - run-time partition pruning - partition-wise join (Ashutosh Bapat is already working on this) - try to reduce lock levels - hash partitioning - the ability to create an index on the parent and have all of the children inherit it; this should work something like constraint inheritance. you could argue that this doesn't add any real new capability but it's a huge usability feature. - teaching autovacuum enough about inheritance hierarchies for it to update the parent statistics when they get stale despite the lack of any actual inserts/updates/deletes to the parent. this has been pending for a long time, but it's only going to get more important - row movement (aka avoiding the need for an ON UPDATE trigger on each partition) - insert (and eventually update) tuple routing for foreign partitions - not scanning the parent - fixing the insert routing so that we can skip tuple conversion where possible - fleshing out the documentation I would like to work on two tasks: - insert (and eventually update) tuple routing for foreign partition. - the ability to create an index on the parent and have all of the children inherit it; The first one has been implemented in pg_pathman somehow, but the code relies on dirty hacks, so the FDW API has to be improved. As for the extended index support, it doesn't look like a super-hard task. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Patches to enable extraction state of query execution from external session
01.09.2016 18:58, Tom Lane пишет: Maksim Milyutin writes: On Tue, Aug 30, 2016 at 9:34 AM, Maksim Milyutin mailto:m.milyu...@postgrespro.ru>> wrote: Yes, but the problem is that nothing gives you the guarantee that at the moment you decide to handle the interrupt, the QueryDesc structures you're looking at are in a good shape for Explain* functions to run on them. Even if that appears to be the case in your testing now, there's no way to tell if that will be the case at any future point in time. CHECK_FOR_INTERRUPTS are located in places where query state (QueryDesc structure) is more or less consistent. Really? Even if that's 100% true today, which I wouldn't bet very much on, it seems like a really dangerous property to insist on system-wide. The only restriction we have ever placed on CHECK_FOR_INTERRUPTS is that it occur at places where it'd be safe to throw elog(ERROR), and in general we don't assume that the active query is still in a usable state after an error. What you propose would amount to a new restriction that nothing can ever call any nontrivial subroutine while the active query tree is less than fully valid (because the subroutine might contain a CHECK_FOR_INTERRUPTS somewhere). That sounds impractical and unenforceable. Ok, thanks! I could propose a different approach: when CHECK_FOR_INTERRUPTS occurs, set a hook to be executed by the following ExecProcNode(). The hook is going to be provided by my extension. It is expected to send current query state to some recipient backend (the one who asked for it). I think the active query is consistent after any node have worked off one or zero rows. After it has sent all necessary data, the hook will disable itself. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Patches to enable extraction state of query execution from external session
On Tue, Aug 30, 2016 at 9:34 AM, Maksim Milyutin mailto:m.milyu...@postgrespro.ru>> wrote: On Mon, Aug 29, 2016 at 5:22 PM, maksim mailto:m.milyu...@postgrespro.ru> <mailto:m.milyu...@postgrespro.ru <mailto:m.milyu...@postgrespro.ru>>> wrote: Hi, hackers! Now I complete extension that provides facility to see the current state of query execution working on external session in form of EXPLAIN ANALYZE output. This extension works on 9.5 version, for 9.6 and later it doesn't support detailed statistics for parallel nodes yet. I want to present patches to the latest version of PostgreSQL core to enable this extension. Hello, Did you publish the extension itself yet? Hello, extension for version 9.5 is available in repository https://github.com/postgrespro/pg_query_state/tree/master <https://github.com/postgrespro/pg_query_state/tree/master>. Last year (actually, exactly one year ago) I was trying to do something very similar, and it quickly turned out that signals are not the best way to make this sort of inspection. You can find the discussion here: https://www.postgresql.org/message-id/CACACo5Sz7G0MFauC082iM=xx_hq7qq5ndr4jpo+h-o5vp6i...@mail.gmail.com <https://www.postgresql.org/message-id/CACACo5Sz7G0MFauC082iM=xx_hq7qq5ndr4jpo+h-o5vp6i...@mail.gmail.com> Thanks for link! My patch *custom_signal.patch* resolves the problem of «heavy» signal handlers. In essence, I follow the course offered in *procsignal.c* file. They define *ProcSignalReason* values on which the SUGUSR1 is multiplexed. Signal recent causes setting flags for *ProcessInterrupt* actuating, i.e. procsignal_sigusr1_handler() only sets specific flags. When CHECK_FOR_INTERRUPTS appears later on query execution *ProcessInterrupt* is called. Then triggered user defined signal handler is executed. As a result, we have a deferred signal handling. Yes, but the problem is that nothing gives you the guarantee that at the moment you decide to handle the interrupt, the QueryDesc structures you're looking at are in a good shape for Explain* functions to run on them. Even if that appears to be the case in your testing now, there's no way to tell if that will be the case at any future point in time. CHECK_FOR_INTERRUPTS are located in places where query state (QueryDesc structure) is more or less consistent. In these macro calls I pass QueryDesc to Explain* functions. I exactly know that elementary statistics updating functions (e.g. InstrStartNode, InstrStopNode, etc) don't contain CHECK_FOR_INTERRUPTS within itself therefore statistics at least on node level is consistent when backend will be ready to transfer its state. The problem may be in interpretation of collected statistics in Explain* functions. In my practice there was the case when wrong number of inserted rows is shown under INSERT ON CONFLICT request. That request consisted of two parts: SELECT from table and INSERT with check on predicate. And I was interrupted between these parts. Formula for inserted rows was the number of extracting rows from SELECT minus rejected rows from INSERT. And I got imaginary inserted row. I removed the printing number of inserted rows under explain of running query because I don't know whether INSERT node has processed that last row. But the remaining number of rejected rows was deterministic and I showed it. Another problem is use if shm_mq facility, because it manipulates the state of process latch. This is not supposed to happen to a backend happily performing its tasks, at random due to external factors, and this is what the proposed approach introduces In Postgres source code the most WaitLatch() call on process latch is surrounded by loop and forms the pattern like this: for (;;) { if (desired_state_has_occured) break; WaitLatch(MyLatch); CHECK_FOR_INTERRUPTS(); ResetLatch(MyLatch) } The motivation of this decision is pretty clear illustrated by the extract from comment in Postgres core: usage of "the generic process latch has to be robust against unrelated wakeups: Always check that the desired state has occurred, and wait again if not"[1]. I mean that random setting of process latch at the time of query executing don't affect on another usage of that latch later in code. 1. src/backend/storage/lmgr/proc.c:1724 for ProcWaitForSignal function -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Patches to enable extraction state of query execution from external session
On 2016-08-30 11:22:43 +0300, Maksim Milyutin wrote: Hi, On 2016-08-29 18:22:56 +0300, maksim wrote: Now I complete extension that provides facility to see the current state of query execution working on external session in form of EXPLAIN ANALYZE output. This extension works on 9.5 version, for 9.6 and later it doesn't support detailed statistics for parallel nodes yet. Could you expand a bit on what you want this for exactly? Max goal - to push my extension to postgres core. But now it's ready only for 9.5. Prerequisites of this extension are patches presented here. I'm asking what you want this for. "An extension" isn't a detailed description... I want to provide the facility to fetch state of query on some other backend running on the same server. In essence, it's going to be a microlevel monitoring tool. A typical use case looks like this: 1) assume 1st backend executes a simple query: select * from foo join bar on foo.c1=bar.c1 2) somebody tries to fetch state of that backend, so he addresses it through pid: select * from pg_query_state(pid := <1st_backend_pid>) 3) he'll get detailed description of state - something like this: Hash Join (Current loop: actual rows=0, loop number=1) Hash Cond: (foo.c1 = bar.c1) -> Seq Scan on foo (Current loop: actual rows=1, loop number=1) -> Hash (Current loop: actual rows=0, loop number=1) Buckets: 131072 Batches: 8 Memory Usage: 1kB -> Seq Scan on bar (Current loop: actual rows=49, loop number=1) Note that I've added *Current loop* records with mumber of emitted rows (*actual rows*) and *loop number* attached to each node. We could also add a timing info. For parallel nodes I want to print statistics for each worker separately (it's not finished yet). You could also watch my screencast (it's short enough) to get the idea: https://asciinema.org/a/981bed2lu7r8sx60u5lsjei30 2. Patch that enables to interrupt the query executor (executor_hooks.patch). This patch enables to hang up hooks on executor function of each node (ExecProcNode). I define hooks before any node execution and after execution. I use this patch to add possibility of query tracing by emitted rows from any node. I interrupt query executor after any node delivers one or zero rows to upper node. And after execution of specific number trace steps I can get the query state of traceable backend which will be somewhat deterministic. I use this possibility for regression tests of my extension. This will increase executor overhead. In simple case we have checks on existence of hooks. That *is* noticeable. Then I'll really consider the case with hiding hook checking inside the "if (instrument)" statement, thanks! >>> I think we'll need to find a way >>> to hide this behind the existing if (instrument) branches. >> >> And so can be. It doesn't matter for trace mode. But I think instrument >> branch is intended only for collecting statistics by nodes. > > I can't follow here. That's all what analyze is about? > I meant that hiding hooks is not universal solution. If 'instrument' variable is empty (e.g. query without analyze) hooks become disabled. But in my case 'instrument' is initialized anyway and I don't care about it. >> 3. Patch that enables to output runtime explain statistics >> (runtime_explain.patch). >> This patch extends the regular explain functionality. The problem is in the >> point that regular explain call makes result output after query execution >> performing InstrEndLoop on nodes where necessary. My patch introduces >> specific flag *runtime* that indicates whether we explain running query and >> does some insertions in source code dedicated to output the statistics of >> running query. > > Unless I'm missing something this doesn't really expose a user of this > functionality? > Probably I could exclude *runtime_explain.patch* from the Postgres core through copying *explain.c* to module's directory and its further customization for my purposes. But in that case I'd have to maintain 'local explain', fixing bugs and coping with other issues from time to time (i.e. in case of major upgrade). -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Patches to enable extraction state of query execution from external session
Hi, On 2016-08-29 18:22:56 +0300, maksim wrote: Now I complete extension that provides facility to see the current state of query execution working on external session in form of EXPLAIN ANALYZE output. This extension works on 9.5 version, for 9.6 and later it doesn't support detailed statistics for parallel nodes yet. Could you expand a bit on what you want this for exactly? Max goal - to push my extension to postgres core. But now it's ready only for 9.5. Prerequisites of this extension are patches presented here. 2. Patch that enables to interrupt the query executor (executor_hooks.patch). This patch enables to hang up hooks on executor function of each node (ExecProcNode). I define hooks before any node execution and after execution. I use this patch to add possibility of query tracing by emitted rows from any node. I interrupt query executor after any node delivers one or zero rows to upper node. And after execution of specific number trace steps I can get the query state of traceable backend which will be somewhat deterministic. I use this possibility for regression tests of my extension. This will increase executor overhead. In simple case we have checks on existence of hooks. We may suppose to use only not heavy processing on hooks under regular execution of query. In my case (query trace), I set up hooks under trace mode and throw off otherwise. > I think we'll need to find a way > to hide this behind the existing if (instrument) branches. And so can be. It doesn't matter for trace mode. But I think instrument branch is intended only for collecting statistics by nodes. 3. Patch that enables to output runtime explain statistics (runtime_explain.patch). This patch extends the regular explain functionality. The problem is in the point that regular explain call makes result output after query execution performing InstrEndLoop on nodes where necessary. My patch introduces specific flag *runtime* that indicates whether we explain running query and does some insertions in source code dedicated to output the statistics of running query. Unless I'm missing something this doesn't really expose a user of this functionality? This patch is only for extension. As described in https://www.postgresql.org/message-id/CACACo5Sz7G0MFauC082iM=xx_hq7qq5ndr4jpo+h-o5vp6i...@mail.gmail.com regular ExplainNode() prints only statistics after query execution and asserts under InstEndLoop(). My patch releases this problem and rewrite formulas for statistic parameters appropriate to running queries without affecting regular EXPLAIN outputs. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Patches to enable extraction state of query execution from external session
On Mon, Aug 29, 2016 at 5:22 PM, maksim mailto:m.milyu...@postgrespro.ru>> wrote: Hi, hackers! Now I complete extension that provides facility to see the current state of query execution working on external session in form of EXPLAIN ANALYZE output. This extension works on 9.5 version, for 9.6 and later it doesn't support detailed statistics for parallel nodes yet. I want to present patches to the latest version of PostgreSQL core to enable this extension. Hello, Did you publish the extension itself yet? Hello, extension for version 9.5 is available in repository https://github.com/postgrespro/pg_query_state/tree/master. Last year (actually, exactly one year ago) I was trying to do something very similar, and it quickly turned out that signals are not the best way to make this sort of inspection. You can find the discussion here: https://www.postgresql.org/message-id/CACACo5Sz7G0MFauC082iM=xx_hq7qq5ndr4jpo+h-o5vp6i...@mail.gmail.com Thanks for link! My patch *custom_signal.patch* resolves the problem of «heavy» signal handlers. In essence, I follow the course offered in *procsignal.c* file. They define *ProcSignalReason* values on which the SUGUSR1 is multiplexed. Signal recent causes setting flags for *ProcessInterrupt* actuating, i.e. procsignal_sigusr1_handler() only sets specific flags. When CHECK_FOR_INTERRUPTS appears later on query execution *ProcessInterrupt* is called. Then triggered user defined signal handler is executed. As a result, we have a deferred signal handling. Patch *runtime_explain.patch* releases the problem with error from InstrEndLoop(). I catch all places where this unlucky function is called and wrap in checks on *runtime* flag. This flag indicates whether *ExplainQuery* is called for running query. Also I complement explain output, you can see details in README.md in repository. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers