Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Thu, Dec 24, 2020 at 01:23:40PM +0900, Michael Paquier wrote: > Please note that I have added an entry in the CF app for the moment so > as we don't lose track of it: > https://commitfest.postgresql.org/31/2892/ I have been able to look at that again today, and applied it. I have tweaked a bit the comments, and added an elog(ERROR) as a safety net for explain.c if the IFNE code path is taken for an object type that is not expected with CreateTableAsStmt. -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Thu, Dec 24, 2020 at 09:10:22AM +0530, Bharath Rupireddy wrote: > Since I tested that with all the formats manually here and it works, > so I don't want to make the test cases complicated with adding > explain_filter() function into matview.sql and select_into.sql and all > that. I'm okay without those test cases. Please note that I have added an entry in the CF app for the moment so as we don't lose track of it: https://commitfest.postgresql.org/31/2892/ >> What we have here looks rather committable. Let's wait until the >> period of vacations is over before wrapping this up to give others the >> occasion to comment. > > Thanks! Happy Vacations! You too! -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Thu, Dec 24, 2020 at 7:40 AM Michael Paquier wrote: > > On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote: > > +1. Shall we add some test cases(with xml, yaml, json formats as is > > currently being done in explain.sql) to cover that? We can have the > > explain_filter() function to remove the unstable parts in the output, > > it looks something like below. If yes, please let me know I can add > > them to matview and select_into. > > I am not sure that we need tests for all the formats, but having at > least one of them sounds good to me. I leave the choice up to you. Since I tested that with all the formats manually here and it works, so I don't want to make the test cases complicated with adding explain_filter() function into matview.sql and select_into.sql and all that. I'm okay without those test cases. > What we have here looks rather committable. Let's wait until the > period of vacations is over before wrapping this up to give others the > occasion to comment. Thanks! Happy Vacations! With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Wed, Dec 23, 2020 at 07:13:33PM +0530, Bharath Rupireddy wrote: > +1. Shall we add some test cases(with xml, yaml, json formats as is > currently being done in explain.sql) to cover that? We can have the > explain_filter() function to remove the unstable parts in the output, > it looks something like below. If yes, please let me know I can add > them to matview and select_into. I am not sure that we need tests for all the formats, but having at least one of them sounds good to me. I leave the choice up to you. What we have here looks rather committable. Let's wait until the period of vacations is over before wrapping this up to give others the occasion to comment. -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Wed, Dec 23, 2020 at 6:01 PM Michael Paquier wrote: > On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote: > > On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier wrote: > >> Note: I'd like to think that we could choose a better name for > >> CheckRelExistenceInCTAS(). > > > > I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. > > Please let me know if this is okay. > > After thinking about that, using "CTAS" while other routines in the > same area use "CreateTableAs" looks inconsistent to me. So I have > come up with CreateTableAsRelExists() as name. I think CreateTableAsRelExists() can return true if the relation already exists and false otherwise, to keep in sync with the function name. I updated this and attached v7 patch. > As the same time, I have looked at the git history to note 9bd27b7 > where we had better not give an empty output for non-text formats. So > I'd like to think that it makes sense to use ExplainDummyGroup() if > the relation exists with IF NOT EXISTS, keeping some consistency. > > What do you think? +1. Shall we add some test cases(with xml, yaml, json formats as is currently being done in explain.sql) to cover that? We can have the explain_filter() function to remove the unstable parts in the output, it looks something like below. If yes, please let me know I can add them to matview and select_into. postgres=# select explain_filter('explain(analyze, format xml) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter --- http://www.postgresql.org/N/explain";>+ + (1 row) postgres=# select explain_filter('explain(analyze, format yaml) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter - - "CREATE TABLE AS" (1 row) postgres=# select explain_filter('explain(analyze, format json) create table if not exists t1 as select 1;'); NOTICE: relation "t1" already exists, skipping explain_filter - [ + "CREATE TABLE AS"+ ] (1 row) With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v7-0001-Fail-Fast-In-CTAS-CMV-If-Relation-Already-Exists.patch Description: Binary data
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Tue, Dec 22, 2020 at 03:12:15PM +0530, Bharath Rupireddy wrote: > On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier wrote: >> Note: I'd like to think that we could choose a better name for >> CheckRelExistenceInCTAS(). > > I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. > Please let me know if this is okay. After thinking about that, using "CTAS" while other routines in the same area use "CreateTableAs" looks inconsistent to me. So I have come up with CreateTableAsRelExists() as name. As the same time, I have looked at the git history to note 9bd27b7 where we had better not give an empty output for non-text formats. So I'd like to think that it makes sense to use ExplainDummyGroup() if the relation exists with IF NOT EXISTS, keeping some consistency. What do you think? -- Michael From ed6fb44e5433b86fbed454f1b52aaa8538a377ae Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 23 Dec 2020 21:28:07 +0900 Subject: [PATCH v6] Fail Fast In CTAS/CMV If Relation Already Exists Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without IF-NOT-EXISTS clause, the existence of the relation (either table or materialized view) gets checked during execution and an error is thrown there. All the unnecessary rewrite and planning for the SELECT part of the query have happened just to fail later. However, if IF-NOT-EXISTS clause is present, then a notice is issued and returned immediately without rewrite and planning further. This seems somewhat inconsistent. This patch propose to check the relation existence early in ExecCreateTableAs() as well as in ExplainOneUtility() and throw an error in case it exists already to avoid unnecessary rewrite, planning and execution of the SELECT part. --- src/include/commands/createas.h | 2 + src/backend/commands/createas.c | 53 --- src/backend/commands/explain.c| 13 ++ src/test/regress/expected/matview.out | 38 src/test/regress/expected/select_into.out | 42 ++ src/test/regress/sql/matview.sql | 23 ++ src/test/regress/sql/select_into.sql | 21 + 7 files changed, 177 insertions(+), 15 deletions(-) diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h index 7629230254..d8cfb62522 100644 --- a/src/include/commands/createas.h +++ b/src/include/commands/createas.h @@ -29,4 +29,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause); extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause); +extern bool CreateTableAsRelExists(CreateTableAsStmt *ctas); + #endif /* CREATEAS_H */ diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 6bf6c5a310..01c94f1bce 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -239,21 +239,9 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt, PlannedStmt *plan; QueryDesc *queryDesc; - if (stmt->if_not_exists) - { - Oid nspid; - - nspid = RangeVarGetCreationNamespace(stmt->into->rel); - - if (get_relname_relid(stmt->into->rel->relname, nspid)) - { - ereport(NOTICE, - (errcode(ERRCODE_DUPLICATE_TABLE), - errmsg("relation \"%s\" already exists, skipping", - stmt->into->rel->relname))); - return InvalidObjectAddress; - } - } + /* Check if the relation exists or not */ + if (!CreateTableAsRelExists(stmt)) + return InvalidObjectAddress; /* * Create the tuple receiver object and insert info it will need @@ -400,6 +388,41 @@ GetIntoRelEFlags(IntoClause *intoClause) return flags; } +/* + * CreateTableAsRelExists --- check existence of relation for CreateTableAsStmt + * + * Utility wrapper checking if the relation pending for creation in the + * CreateTableAsStmt query already exists or not. Returns true if the + * relation can be created. + */ +bool +CreateTableAsRelExists(CreateTableAsStmt *ctas) +{ + Oid nspid; + IntoClause *into = ctas->into; + + nspid = RangeVarGetCreationNamespace(into->rel); + + if (get_relname_relid(into->rel->relname, nspid)) + { + if (!ctas->if_not_exists) + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists", + into->rel->relname))); + + /* The relation exists and IF NOT EXISTS has been specified */ + ereport(NOTICE, +(errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists, skipping", + into->rel->relname))); + return false; + } + + /* Relation does not exist, it can be created */ + return true; +} + /* * CreateIntoRelDestReceiver -- create a suitable DestReceiver object * diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 43f9b01e83..c2dd38f72b 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -435,6 +435,19 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStm
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Tue, Dec 22, 2020 at 2:07 PM Michael Paquier wrote: > I was looking at your patch today, and I actually found the conclusion > to output an empty plan while issuing a NOTICE to be quite intuitive > if the caller uses IF NOT EXISTS with EXPLAIN. Thanks! > Thanks for adding some test cases! Some of them were exact > duplicates, so it is possible to reduce the number of queries without > impacting the coverage. I have also chosen a query that forces an > error within the planner. > Please see the attached. IF NOT EXISTS implies that CTAS or CREATE > MATVIEW will never ERROR if the relation already exists, with or > without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent > behavior across all the patterns. LGTM. > Note: I'd like to think that we could choose a better name for > CheckRelExistenceInCTAS(). I changed it to IsCTASRelCreationAllowed() and attached a v5 patch. Please let me know if this is okay. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From fb96bbaa5fb5797517902e367dc134733da6d76c Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 22 Dec 2020 14:51:25 +0530 Subject: [PATCH v5] Fail Fast In CTAS/CMV If Relation Already Exists Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without IF-NOT-EXISTS clause, the existence of the relation (either table or materialized view) gets checked during execution and an error is thrown there. All the unnecessary rewrite and planning for the SELECT part of the query have happened just to fail later. However, if IF-NOT-EXISTS clause is present, then a notice is issued and returned immediately without rewrite and planning further. This seems somewhat inconsistent. This patch propose to check the relation existence early in ExecCreateTableAs() as well as in ExplainOneUtility() and throw an error in case it exists already to avoid unnecessary rewrite, planning and execution of the SELECT part. --- src/backend/commands/createas.c | 59 +-- src/backend/commands/explain.c| 8 +++ src/include/commands/createas.h | 2 + src/test/regress/expected/matview.out | 38 +++ src/test/regress/expected/select_into.out | 31 src/test/regress/sql/matview.sql | 23 + src/test/regress/sql/select_into.sql | 16 ++ 7 files changed, 162 insertions(+), 15 deletions(-) diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 6bf6c5a310..49feadfb9a 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -239,21 +239,9 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt, PlannedStmt *plan; QueryDesc *queryDesc; - if (stmt->if_not_exists) - { - Oid nspid; - - nspid = RangeVarGetCreationNamespace(stmt->into->rel); - - if (get_relname_relid(stmt->into->rel->relname, nspid)) - { - ereport(NOTICE, - (errcode(ERRCODE_DUPLICATE_TABLE), - errmsg("relation \"%s\" already exists, skipping", - stmt->into->rel->relname))); - return InvalidObjectAddress; - } - } + /* Check if the to-be-created relation exists or not */ + if (!IsCTASRelCreationAllowed(stmt)) + return InvalidObjectAddress; /* * Create the tuple receiver object and insert info it will need @@ -400,6 +388,47 @@ GetIntoRelEFlags(IntoClause *intoClause) return flags; } +/* + * IsCTASRelCreationAllowed --- check existence of relation in CREATE TABLE AS + * + * This function checks if the relation CTAS is trying to create already + * exists. If so, with if-not-exists clause it issues a notice and returns + * false, without if-not-exists clause it throws an error. Returns true if the + * relation can be created. This routine can be used as a fast-exit path when + * checking if a CTAS query needs to be executed or not. + */ +bool +IsCTASRelCreationAllowed(CreateTableAsStmt *ctas) +{ + Oid nspid; + IntoClause *into = ctas->into; + + nspid = RangeVarGetCreationNamespace(into->rel); + + if (get_relname_relid(into->rel->relname, nspid)) + { + if (!ctas->if_not_exists) + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists", + into->rel->relname))); + + /* + * The relation exists and IF NOT EXISTS has been specified, + * so let the caller know about that and let the processing + * continue. + */ + ereport(NOTICE, +(errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists, skipping", + into->rel->relname))); + return false; + } + + /* Relation does not exist, so it can be created */ + return true; +} + /* * CreateIntoRelDestReceiver -- create a suitable DestReceiver object * diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 43f9b01e83..372c6a196c 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -435,6 +435,14 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, C
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Mon, Dec 21, 2020 at 12:01:38PM +0530, Bharath Rupireddy wrote: > On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy >> I tried to make it consistent by issuing NOTICE (not an error) even >> for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and >> exit from the ExplainOneUtility, we could output an empty plan to the >> user because, by now ExplainResultDesc would have been called at the >> start of the explain via PortalStart(). I didn't find a clean way of >> coding if we are not okay to show notice and empty plan to the user. >> >> Any suggestions on achieving above? I was looking at your patch today, and I actually found the conclusion to output an empty plan while issuing a NOTICE to be quite intuitive if the caller uses IF NOT EXISTS with EXPLAIN. > Attaching v3 patch that also contains test cases. Please review it further. Thanks for adding some test cases! Some of them were exact duplicates, so it is possible to reduce the number of queries without impacting the coverage. I have also chosen a query that forces an error within the planner. Please see the attached. IF NOT EXISTS implies that CTAS or CREATE MATVIEW will never ERROR if the relation already exists, with or without EXPLAIN, EXECUTE or WITH NO DATA, so that gets us a consistent behavior across all the patterns. Note: I'd like to think that we could choose a better name for CheckRelExistenceInCTAS(). -- Michael From 606520d6f023163024ec5f215b45e9a89f093884 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 22 Dec 2020 17:35:22 +0900 Subject: [PATCH v4] Fail Fast In CTAS/CMV If Relation Already Exists Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without IF-NOT-EXISTS clause, the existence of the relation (either table or materialized view) gets checked during execution and an error is thrown there. All the unnecessary rewrite and planning for the SELECT part of the query have happened just to fail later. However, if IF-NOT-EXISTS clause is present, then a notice is issued and returned immediately without rewrite and planning further. This seems somewhat inconsistent. This patch propose to check the relation existence early in ExecCreateTableAs() as well as in ExplainOneUtility() and throw an error in case it exists already to avoid unnecessary rewrite, planning and execution of the SELECT part. --- src/include/commands/createas.h | 2 + src/backend/commands/createas.c | 56 +-- src/backend/commands/explain.c| 8 src/test/regress/expected/matview.out | 38 +++ src/test/regress/expected/select_into.out | 31 + src/test/regress/sql/matview.sql | 23 ++ src/test/regress/sql/select_into.sql | 16 +++ 7 files changed, 159 insertions(+), 15 deletions(-) diff --git a/src/include/commands/createas.h b/src/include/commands/createas.h index 7629230254..404adf814b 100644 --- a/src/include/commands/createas.h +++ b/src/include/commands/createas.h @@ -29,4 +29,6 @@ extern int GetIntoRelEFlags(IntoClause *intoClause); extern DestReceiver *CreateIntoRelDestReceiver(IntoClause *intoClause); +extern bool CheckRelExistenceInCTAS(CreateTableAsStmt *ctas); + #endif /* CREATEAS_H */ diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index 6bf6c5a310..2c21b1b87f 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -239,21 +239,9 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt, PlannedStmt *plan; QueryDesc *queryDesc; - if (stmt->if_not_exists) - { - Oid nspid; - - nspid = RangeVarGetCreationNamespace(stmt->into->rel); - - if (get_relname_relid(stmt->into->rel->relname, nspid)) - { - ereport(NOTICE, - (errcode(ERRCODE_DUPLICATE_TABLE), - errmsg("relation \"%s\" already exists, skipping", - stmt->into->rel->relname))); - return InvalidObjectAddress; - } - } + /* Check if the to-be-created relation exists or not */ + if (!CheckRelExistenceInCTAS(stmt)) + return InvalidObjectAddress; /* * Create the tuple receiver object and insert info it will need @@ -400,6 +388,44 @@ GetIntoRelEFlags(IntoClause *intoClause) return flags; } +/* + * CheckRelExistenceInCTAS --- check existence of relation in CREATE TABLE AS + * + * This routine can be used as a fast-exit path when checking if a CTAS query + * needs to be executed or not. Returns true if the relation can be created. + */ +bool +CheckRelExistenceInCTAS(CreateTableAsStmt *ctas) +{ + Oid nspid; + IntoClause *into = ctas->into; + + nspid = RangeVarGetCreationNamespace(into->rel); + + if (get_relname_relid(into->rel->relname, nspid)) + { + if (!ctas->if_not_exists) + ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg("relation \"%s\" already exists", + into->rel->relname))); + + /* + * The relation exists and IF NOT EXISTS has been specified, + * so let the caller know about that and let the processing
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 18, 2020 at 8:15 AM Bharath Rupireddy wrote: > On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier wrote: > > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > > > The behavior of the ctas/cmv, in case the relation already exists is as > > > shown in [1]. The things that have been changed with the patch are: 1) In > > > any case we do not rewrite or plan the select part if the relation already > > > exists 2) For explain ctas/cmv (without analyze), now the relation > > > existence is checked early and the error is thrown as highlighted in [1]. > > > > > > With patch, there is no behavioral change(from that of master branch) in > > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > > > notice. > > > > > > Thoughts? > > > > HEAD is already a mixed bad of behaviors, and the set of results you > > are presenting here is giving a similar impression. It brings in some > > sanity by just ignoring the effects of the IF NOT EXISTS clause all > > the time still that's not consistent with the queries not using > > EXPLAIN. > > I tried to make it consistent by issuing NOTICE (not an error) even > for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and > exit from the ExplainOneUtility, we could output an empty plan to the > user because, by now ExplainResultDesc would have been called at the > start of the explain via PortalStart(). I didn't find a clean way of > coding if we are not okay to show notice and empty plan to the user. > > Any suggestions on achieving above? > > > Hmm. Looking for similar behaviors, I can see one case in > > select_into.sql where we just never execute the plan when using WITH > > NO DATA but still show the plan, meaning that the query gets planned > > but it just gets marked as "(never executed)" if attempting to use > > ANALYZE. > > Yes, with no data we would see "(never executed)" for explain analyze > if the relation does not already exist. If the relation does exist, > then the error/notice. > > >There may be use cases for that as the user directly asked directly for an > >EXPLAIN. > > IMHO, in any case checking for the existence of the relations > specified in a query is must before we output something to the user. > For instance, the query "explain select * from non_existent_tbl;" > where non_existent_tbl doesn't exist, throws an error. Similarly, > "explain create table already_existing_tbl as select * from > another_tbl;" where the table ctas/select into trying to create > already exists, should also throw error. But that's not happening > currently on master. Which seems to be a problem to me. So, with the > patch proposed here, we error out in this case. > > If the user really wants to see the explain plan, then he/she should > use the correct query. > > > Note: the patch needs tests for all the patterns you would like to > > stress. This way it is easier to follow the patterns that are > > changing with your patch and compare them with the HEAD behavior (like > > looking at the diffs with the tests of the patch, but without the > > diffs in src/backend/). > > Sure, I will add test cases and post v3 patch. Attaching v3 patch that also contains test cases. Please review it further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v3-0001-Fail-Fast-In-CTAS-CMV-If-Relation-Already-Exists.patch Description: Binary data
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 18, 2020 at 7:18 AM Michael Paquier wrote: > On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > > The behavior of the ctas/cmv, in case the relation already exists is as > > shown in [1]. The things that have been changed with the patch are: 1) In > > any case we do not rewrite or plan the select part if the relation already > > exists 2) For explain ctas/cmv (without analyze), now the relation > > existence is checked early and the error is thrown as highlighted in [1]. > > > > With patch, there is no behavioral change(from that of master branch) in > > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > > notice. > > > > Thoughts? > > HEAD is already a mixed bad of behaviors, and the set of results you > are presenting here is giving a similar impression. It brings in some > sanity by just ignoring the effects of the IF NOT EXISTS clause all > the time still that's not consistent with the queries not using > EXPLAIN. I tried to make it consistent by issuing NOTICE (not an error) even for EXPLAIN/EXPLAIN ANALYZE IF NOT EXISTS case. If issue notice and exit from the ExplainOneUtility, we could output an empty plan to the user because, by now ExplainResultDesc would have been called at the start of the explain via PortalStart(). I didn't find a clean way of coding if we are not okay to show notice and empty plan to the user. Any suggestions on achieving above? > Hmm. Looking for similar behaviors, I can see one case in > select_into.sql where we just never execute the plan when using WITH > NO DATA but still show the plan, meaning that the query gets planned > but it just gets marked as "(never executed)" if attempting to use > ANALYZE. Yes, with no data we would see "(never executed)" for explain analyze if the relation does not already exist. If the relation does exist, then the error/notice. >There may be use cases for that as the user directly asked directly for an >EXPLAIN. IMHO, in any case checking for the existence of the relations specified in a query is must before we output something to the user. For instance, the query "explain select * from non_existent_tbl;" where non_existent_tbl doesn't exist, throws an error. Similarly, "explain create table already_existing_tbl as select * from another_tbl;" where the table ctas/select into trying to create already exists, should also throw error. But that's not happening currently on master. Which seems to be a problem to me. So, with the patch proposed here, we error out in this case. If the user really wants to see the explain plan, then he/she should use the correct query. > Note: the patch needs tests for all the patterns you would like to > stress. This way it is easier to follow the patterns that are > changing with your patch and compare them with the HEAD behavior (like > looking at the diffs with the tests of the patch, but without the > diffs in src/backend/). Sure, I will add test cases and post v3 patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Thu, Dec 17, 2020 at 03:06:59PM +0530, Bharath Rupireddy wrote: > The behavior of the ctas/cmv, in case the relation already exists is as > shown in [1]. The things that have been changed with the patch are: 1) In > any case we do not rewrite or plan the select part if the relation already > exists 2) For explain ctas/cmv (without analyze), now the relation > existence is checked early and the error is thrown as highlighted in [1]. > > With patch, there is no behavioral change(from that of master branch) in > explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the > notice. > > Thoughts? HEAD is already a mixed bad of behaviors, and the set of results you are presenting here is giving a similar impression. It brings in some sanity by just ignoring the effects of the IF NOT EXISTS clause all the time still that's not consistent with the queries not using EXPLAIN. Hmm. Looking for similar behaviors, I can see one case in select_into.sql where we just never execute the plan when using WITH NO DATA but still show the plan, meaning that the query gets planned but it just gets marked as "(never executed)" if attempting to use ANALYZE. There may be use cases for that as the user directly asked directly for an EXPLAIN. Note: the patch needs tests for all the patterns you would like to stress. This way it is easier to follow the patterns that are changing with your patch and compare them with the HEAD behavior (like looking at the diffs with the tests of the patch, but without the diffs in src/backend/). -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Mon, Dec 14, 2020 at 1:54 PM Bharath Rupireddy wrote: > On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier wrote: > > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote: > > > Please note that this case fails with your patch, but the presence of > > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE > > > instead, no? > > Thanks for the use case. The provided use case (or for that matter any > use case with explain analyze ctas if-not-exists) fails if the > relation already exists. It happens on the master branch, please have > a look at tests [1]. You are right in saying that whether it is > explain/explain analyze ctas if there is if-not-exists we should issue > notice instead of error as with plain ctas. > > Do we want to fix this behaviour for explain/explain analyze ctat with > if-not-exists cases? Thoughts? > > If yes, we could change the code in ExplainOneUtility() such that we > check relation existence before rewrite/planning, issue notice and > return. Then. the user sees a notice and an empty plan as we are > returning from ExplainOneUtility(). Is it okay to show a warning and > an empty plan to the user? Thoughts? > > >> Taking this case specifically (OK, I am playing with > > > the rules a bit to insert data into the relation itself, still), this > > > query may finish by adding tuples to the table whose creation should > > > have been bypassed but the query got executed and inserted tuples. > > IIUC, with the use case provided, the tuples will not be inserted as > the query later fails (and the txn gets aborted) if the relation > exists. > > > > That's one example of behavior that may be confusing. There may be > > > others, but it seems to me that it may be simpler to execute or even > > > plan the query at all if the relation already exists. > > > > Er.. Sorry. I meant here to *not* execute or even *not* plan the > > query at all if the relation already exists. > > +1 to not plan and execute the query at all if the relation which > ctas/cmv is trying to create already exists. Posting a v2 patch after modifying the new function CheckRelExistenceInCTAS() a bit as suggested earlier. The behavior of the ctas/cmv, in case the relation already exists is as shown in [1]. The things that have been changed with the patch are: 1) In any case we do not rewrite or plan the select part if the relation already exists 2) For explain ctas/cmv (without analyze), now the relation existence is checked early and the error is thrown as highlighted in [1]. With patch, there is no behavioral change(from that of master branch) in explain analyze ctas/cmv with if-not-exists i.e. error is thrown not the notice. Thoughts? [1] With patch: postgres=# create table foo as select 1; ERROR: relation "foo" already exists postgres=# create table if not exists foo as select 1; NOTICE: relation "foo" already exists, skipping CREATE TABLE AS postgres=# explain analyze create table foo as select 1; ERROR: relation "foo" already exists postgres=# explain analyze create table if not exists foo as select 1; ERROR: relation "foo" already exists postgres=# explain create table foo as select 1; ERROR: relation "foo" already exists postgres=# explain create table if not exists foo as select 1; ERROR: relation "foo" already exists On master/without patch: postgres=# create table foo as select 1; ERROR: relation "foo" already exists postgres=# create table if not exists foo as select 1; NOTICE: relation "foo" already exists, skipping CREATE TABLE AS postgres=# explain analyze create table foo as select 1; ERROR: relation "foo" already exists postgres=# explain analyze create table if not exists foo as select 1; ERROR: relation "foo" already exists postgres=# explain create table foo as select 1; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=4) (1 row) postgres=# explain create table if not exists foo as select 1; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=4) (1 row) With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From ae963a1e945bad1932221990d72b09deac42dc20 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 17 Dec 2020 14:13:27 +0530 Subject: [PATCH v2] Fail Fast In CTAS/CMV If Relation Already Exists Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without IF-NOT-EXISTS clause, the existence of the relation (either table or materialized view) gets checked during execution and an error is thrown there. All the unnecessary rewrite and planning for the SELECT part of the query have happened just to fail later. However, if IF-NOT-EXISTS clause is present, then a notice is issued and returned immediately without rewrite and planning further. This seems somewhat inconsistent. This patch propose to check the relation existence early in ExecCreateTableAs() as well as in ExplainOneUtility() and throw an error in case it exis
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Mon, Dec 14, 2020 at 11:52 AM Michael Paquier wrote: > On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote: > > Please note that this case fails with your patch, but the presence of > > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE > > instead, no? Thanks for the use case. The provided use case (or for that matter any use case with explain analyze ctas if-not-exists) fails if the relation already exists. It happens on the master branch, please have a look at tests [1]. You are right in saying that whether it is explain/explain analyze ctas if there is if-not-exists we should issue notice instead of error as with plain ctas. Do we want to fix this behaviour for explain/explain analyze ctat with if-not-exists cases? Thoughts? If yes, we could change the code in ExplainOneUtility() such that we check relation existence before rewrite/planning, issue notice and return. Then. the user sees a notice and an empty plan as we are returning from ExplainOneUtility(). Is it okay to show a warning and an empty plan to the user? Thoughts? >> Taking this case specifically (OK, I am playing with > > the rules a bit to insert data into the relation itself, still), this > > query may finish by adding tuples to the table whose creation should > > have been bypassed but the query got executed and inserted tuples. IIUC, with the use case provided, the tuples will not be inserted as the query later fails (and the txn gets aborted) if the relation exists. > > That's one example of behavior that may be confusing. There may be > > others, but it seems to me that it may be simpler to execute or even > > plan the query at all if the relation already exists. > > Er.. Sorry. I meant here to *not* execute or even *not* plan the > query at all if the relation already exists. +1 to not plan and execute the query at all if the relation which ctas/cmv is trying to create already exists. [1] - postgres=# explain analyze postgres-# create table if not exists aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; ERROR: relation "aa" already exists postgres=# explain analyze postgres-# create table aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; ERROR: relation "aa" already exists postgres=# explain postgres-# create table aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; QUERY PLAN CTE Scan on insert_query (cost=0.01..0.03 rows=1 width=4) CTE insert_query -> Insert on aa (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) postgres=# explain postgres-# create table if not exists aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; QUERY PLAN CTE Scan on insert_query (cost=0.01..0.03 rows=1 width=4) CTE insert_query -> Insert on aa (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) postgres=# create table aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; ERROR: relation "aa" already exists postgres=# create table if not exists aa as postgres-#with insert_query as postgres-# (insert into aa values (1) returning a1) postgres-#select * from insert_query; NOTICE: relation "aa" already exists, skipping CREATE TABLE AS With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Mon, Dec 14, 2020 at 03:15:12PM +0900, Michael Paquier wrote: > Please note that this case fails with your patch, but the presence of > IF NOT EXISTS should ensure that we don't fail and issue a NOTICE > instead, no? Taking this case specifically (OK, I am playing with > the rules a bit to insert data into the relation itself, still), this > query may finish by adding tuples to the table whose creation should > have been bypassed but the query got executed and inserted tuples. > That's one example of behavior that may be confusing. There may be > others, but it seems to me that it may be simpler to execute or even > plan the query at all if the relation already exists. Er.. Sorry. I meant here to *not* execute or even *not* plan the query at all if the relation already exists. -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 11, 2020 at 03:03:46PM +0530, Bharath Rupireddy wrote: > I may not have got your above scenario correctly(it will be good if > you can provide the use case in case I want to check something there). It is possible to have DML queries in WITH clauses, as long as they use RETURNING to feed tuples to the outer query. Just imagine something like that: =# explain analyze create table if not exists aa as with insert_query as (insert into aa values (1) returning a) select * from insert_query; Please note that this case fails with your patch, but the presence of IF NOT EXISTS should ensure that we don't fail and issue a NOTICE instead, no? Taking this case specifically (OK, I am playing with the rules a bit to insert data into the relation itself, still), this query may finish by adding tuples to the table whose creation should have been bypassed but the query got executed and inserted tuples. That's one example of behavior that may be confusing. There may be others, but it seems to me that it may be simpler to execute or even plan the query at all if the relation already exists. -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 11, 2020 at 1:40 PM Michael Paquier wrote: > On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote: > > I'm not quite sure how other databases behave. If I go by the main > > intention of EXPLAIN without ANALYZE, that should do the planning, > > show it in the output and no execution of the query should happen. For > > EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and > > no execution happens so no existence check for the CTAS/CMV relation > > that will get created if the CTAS/CMV is executed. Having said that, > > the existence of the relations that are in the SELECT part are anyways > > checked during planning for EXPLAIN without ANALYZE. > > I think that it is tricky to define IF NOT EXISTS for a CTAS with > EXPLAIN. How would you for example treat an EXPLAIN ANALYZE with a > query that includes an INSERT RETURNING in a WITH clause. Would you > say that we do nothing if the relation exists? Or would you execute > it, still insert nothing on the result relation because it already > exists, even if the inner query may have inserted something as part of > its execution on a different relation? I may not have got your above scenario correctly(it will be good if you can provide the use case in case I want to check something there). I tried the following way, all the involved relations are being checked for existence even though for EXPLAIN: postgres=# EXPLAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO t1_does_not_exit VALUES (1); ERROR: relation "t1_does_not_exit" does not exist LINE 1: ...LAIN WITH temp1 AS (SELECT * FROM t1) INSERT INTO t1_does_no... ^ IIUC, is it that we want the following behaviour in case the relation CTAS/CMV is trying to create does not exist? Note that the sample queries are run on latest master branch: EXPLAIN: throw an error, instead of the query showing select plan on master branch currently? postgres=# explain create table t2 as select * from t1; QUERY PLAN Seq Scan on t1 (cost=0.00..2.00 rows=100 width=8) EXPLAIN ANALYZE: throw an error as it does on master branch? postgres=# explain analyze create table t2 as select * from t1; ERROR: relation "t2" already exists EXPLAIN with if-not-exists clause: throw a warning and an empty plan from ExplainOneUtility? If not an empty plan, we should be doing the relation existence check before we come to explain routines, maybe in gram.c? On the master branch it doesn't happen, the query shows the plan for select part as shown below. postgres=# explain create table if not exists t2 as select * from t1; QUERY PLAN Seq Scan on t1 (cost=0.00..2.00 rows=100 width=8) EXPLAIN ANALYZE with if-not-exists clause: (ideally, for if-not-exists clause we expect a warning to be issued, but currently relation existence error is thrown) a warning and an empty plan from ExplainOneUtility? If not an empty plan, we should be doing the relation existence check before we come to explain routines, maybe in gram.c? On the master branch an ERROR is thrown. postgres=# explain analyze create table if not exists t2 as select * from t1; ERROR: relation "t2" already exists For plain CTAS -> throw an error as it happens on master branch. postgres=# create table t2 as select * from t1; ERROR: relation "t2" already exists For plain CTAS with if-not-exists clause -> a warning is issued as it happens on master branch. postgres=# create table if not exists t2 as select * from t1; NOTICE: relation "t2" already exists, skipping CREATE TABLE AS With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 11, 2020 at 12:48:49PM +0530, Bharath Rupireddy wrote: > I'm not quite sure how other databases behave. If I go by the main > intention of EXPLAIN without ANALYZE, that should do the planning, > show it in the output and no execution of the query should happen. For > EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and > no execution happens so no existence check for the CTAS/CMV relation > that will get created if the CTAS/CMV is executed. Having said that, > the existence of the relations that are in the SELECT part are anyways > checked during planning for EXPLAIN without ANALYZE. I think that it is tricky to define IF NOT EXISTS for a CTAS with EXPLAIN. How would you for example treat an EXPLAIN ANALYZE with a query that includes an INSERT RETURNING in a WITH clause. Would you say that we do nothing if the relation exists? Or would you execute it, still insert nothing on the result relation because it already exists, even if the inner query may have inserted something as part of its execution on a different relation? -- Michael signature.asc Description: PGP signature
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
On Fri, Dec 11, 2020 at 12:13 PM Hou, Zhijie wrote: > > IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, > > > I can do the following way in ExplainOneUtility and will add a comment on > > > why we are doing this. > > > if (es->analyze) > > > (void) CheckRelExistenceInCTAS(ctas, true); > > > Thoughts? > > Agreed. Thanks! So, I will post an updated patch soon. > Just in case, I took a look at Oracle 12’s behavior about [explain CTAS]. > > Oracle 12 will output the plan without throwing any msg in this case. I'm not quite sure how other databases behave. If I go by the main intention of EXPLAIN without ANALYZE, that should do the planning, show it in the output and no execution of the query should happen. For EXPLAIN CTAS/CMV, only thing that gets planned is the SELECT part and no execution happens so no existence check for the CTAS/CMV relation that will get created if the CTAS/CMV is executed. Having said that, the existence of the relations that are in the SELECT part are anyways checked during planning for EXPLAIN without ANALYZE. IMHO, let's not alter the existing behaviour, if needed, that can be discussed separately. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
> IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, > I can do the following way in ExplainOneUtility and will add a comment on > why we are doing this. > > if (es->analyze) > (void) CheckRelExistenceInCTAS(ctas, true); > > Thoughts? Agreed. Just in case, I took a look at Oracle 12’s behavior about [explain CTAS]. Oracle 12 will output the plan without throwing any msg in this case. Best regards, houzj
Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
Thanks for taking a look at this. On Fri, Dec 11, 2020 at 6:33 AM Hou, Zhijie wrote: > > > Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists > > clause, the existence of the relation gets checked during the execution > > of the select part and an error is thrown there. > > All the unnecessary rewrite and planning for the select part would have > > happened just to fail later. However, if if-not-exists clause is present, > > then a notice is issued and returned immediately without any further rewrite > > or planning for . This seems somewhat inconsistent to me. > > > > I propose to check the relation existence early in ExecCreateTableAs() as > > well as in ExplainOneUtility() and throw an error in case it exists already > > to avoid unnecessary rewrite, planning and execution of the select part. > > > > Attaching a patch. Note that I have not added any test cases as the existing > > test cases in create_table.sql and matview.sql would cover the code. > > > > Thoughts? > > Personally, I think it make sense, as other CMD(such as create > extension/index ...) throw that error > before any further operation too. > > I am just a little worried about the behavior change of [explain CTAS]. > May be someone will complain the change from normal explaininfo to error > output. I think we are clear with the patch for plain i.e. non-EXPLAIN and EXPLAIN ANALYZE CTAS/CMV cases. The behaviour for EXPLAIN is as follows: 1)EXPLAIN without ANALYZE, without patch: select part is planned(note that the relations in the select part are checked for their existence while planning, fails any of them don't exist) , relation(CTAS/CMV being created) existence is not checked as we will not create the relation and execute the plan. 2)EXPLAIN with ANALYZE, without patch: select part is planned, as we execute the plan, relation(CTAS/CMV) existence is checked during the execution and fails there if it exists. 3) EXPLAIN without ANALYZE, with patch: relation(CTAS/CMV) existence is checked before the planning and fails if it exists, without going further to the planning for select part. 4)EXPLAIN with ANALYZE, with patch: relation(CTAS/CMV) existence is checked before the rewrite, planning and fails if it exists, without going further. IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, I can do the following way in ExplainOneUtility and will add a comment on why we are doing this. if (es->analyze) (void) CheckRelExistenceInCTAS(ctas, true); Thoughts? > And I took a look into the patch. > > + StringInfoData emsg; > + > + initStringInfo(&emsg); > + > + if (level == NOTICE) > + appendStringInfo(&emsg, > > Using variable emsg and level seems a little complicated to me. > How about just: > > if (!is_explain && ctas->if_not_exists) > ereport(NOTICE,xxx > else > ereport(ERROR,xxx I will modify it in the next version of the patch which I plan to send once agreed on the above point. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
> Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists > clause, the existence of the relation gets checked during the execution > of the select part and an error is thrown there. > All the unnecessary rewrite and planning for the select part would have > happened just to fail later. However, if if-not-exists clause is present, > then a notice is issued and returned immediately without any further rewrite > or planning for . This seems somewhat inconsistent to me. > > I propose to check the relation existence early in ExecCreateTableAs() as > well as in ExplainOneUtility() and throw an error in case it exists already > to avoid unnecessary rewrite, planning and execution of the select part. > > Attaching a patch. Note that I have not added any test cases as the existing > test cases in create_table.sql and matview.sql would cover the code. > > Thoughts? Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error before any further operation too. I am just a little worried about the behavior change of [explain CTAS]. May be someone will complain the change from normal explaininfo to error output. And I took a look into the patch. + StringInfoData emsg; + + initStringInfo(&emsg); + + if (level == NOTICE) + appendStringInfo(&emsg, Using variable emsg and level seems a little complicated to me. How about just: if (!is_explain && ctas->if_not_exists) ereport(NOTICE,xxx else ereport(ERROR,xxx Best regards, houzj