Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
On 2024/5/16 00:58, Robert Haas wrote: On Tue, Apr 16, 2024 at 3:16 AM Quan Zongliang wrote: According to the discussion with Jian He. Use the guc hook to check if the xid needs to be output. If needed, the statement log can be delayed to be output. I appreciate the work that both of you have put into this, but I think we should reject this patch and remove the TODO item. We currently have some facilities (like log_statement) that log the statement before parsing it or executing it, and others (like log_min_duration_statement) that log it afterward. That is probably not documented as clearly as it should be, but it's long-established behavior. What this patch does is change the behavior of log_statement so that log_statement sometimes logs the statement before it's executed, and sometimes after the statement. I think that's going to be confusing and unhelpful. In particular, right now you can assume that if you set log_statement=all and there's a statement running, it's already been logged. With this change, that would sometimes be true and sometimes false. For example, suppose that at 9am sharp, I run an UPDATE command that takes ten seconds to complete. Right now, the log_statement message will appear at 9am. With this change, it will run at 9am if I do it inside a transaction block that has an XID already, and at 9:00:10am if I do it in a transaction block that does not yet have an XID, or if I do it outside of a transaction. I don't think the benefit of getting the XID in the log message is nearly enough to justify such a strange behavior. I thought about writing statement log when xid assigned. But it's seemed too complicated. I'm inclined to keep it for a while. Until we find a good way or give up. It's a reasonable request, after all.
Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
On 2024/3/11 09:25, Quan Zongliang wrote: On 2024/3/4 15:48, jian he wrote: Maybe we can tolerate LOG, first output the query plan then statement. It is more appropriate to let the extension solve its own problems. Of course, this change is not easy to implement. Due to the way XID is assigned, there seems to be no good solution at the moment. According to the discussion with Jian He. Use the guc hook to check if the xid needs to be output. If needed, the statement log can be delayed to be output.diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 76f48b13d2..217910c9de 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -1068,11 +1068,14 @@ exec_simple_query(const char *query_string) /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { - ereport(LOG, - (errmsg("statement: %s", query_string), -errhidestmt(true), -errdetail_execute(parsetree_list))); - was_logged = true; + if (!Log_has_xid || TransactionIdIsValid(GetTopTransactionIdIfAny())) + { + ereport(LOG, + (errmsg("statement: %s", query_string), + errhidestmt(true), + errdetail_execute(parsetree_list))); + was_logged = true; + } } /* @@ -1283,6 +1286,16 @@ exec_simple_query(const char *query_string) PortalDrop(portal, false); + /* Log if dictated by log_statement and has not been logged. */ + if (!was_logged && check_log_statement(parsetree_list)) + { + ereport(LOG, + (errmsg("statement: %s", query_string), + errhidestmt(true), + errdetail_execute(parsetree_list))); + was_logged = true; + } + if (lnext(parsetree_list, parsetree_item) == NULL) { /* diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 605ff3b045..e7dd9f5027 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -79,6 +79,7 @@ #include "storage/ipc.h" #include "storage/proc.h" #include "tcop/tcopprot.h" +#include "utils/guc.h" #include "utils/guc_hooks.h" #include "utils/memutils.h" #include "utils/ps_status.h" @@ -114,6 +115,9 @@ char *Log_destination_string = NULL; bool syslog_sequence_numbers = true; bool syslog_split_messages = true; +/* Whether the transaction id will appear in the log messages. */ +bool Log_has_xid = false; + /* Processed form of backtrace_functions GUC */ static char *backtrace_function_list; @@ -286,6 +290,50 @@ message_level_is_interesting(int elevel) } +/* + * check_log_line_prefix: GUC check_hook for Log_line_prefix + */ +bool check_log_line_prefix(char **newval, void **extra, GucSource source) +{ + const char *p; + + if ((*newval) == NULL) + { + Log_has_xid = false; + return true; + } + + Log_has_xid = false; + + for (p = (*newval); *p != '\0'; p++) + { + if (*p != '%') + { + /* literal char, just skip */ + continue; + } + + /* must be a '%', so skip to the next char */ + p++; + if (*p == '\0') + break; /* format error - ignore it */ + else if (*p == '%') + { + /* string contains %% */ + continue; + } + + /* process the option */ + if (*p == 'x') + { + Log_has_xid = true; + break; + } + } + + return true; +} + /* * in_error_recursion_trouble --- are we at risk of infinite error recursion? * diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index c68fdc008b..96948e5005 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -4103,7 +4103,7 @@ struct config_string ConfigureNamesString[] = }, _line_prefix, "%m [%p] ", - NULL, NULL, NULL + check_log_line_prefix, NULL, NULL }, { diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62..95361
Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
On 2024/3/4 15:48, jian he wrote: Maybe we can tolerate LOG, first output the query plan then statement. It is more appropriate to let the extension solve its own problems. Of course, this change is not easy to implement. Due to the way XID is assigned, there seems to be no good solution at the moment.
Re: Change the bool member of the Query structure to bits
On 2024/2/20 19:18, Tomas Vondra wrote: On 2/20/24 11:11, Quan Zongliang wrote: Sorry. I forgot to save a file. This is the latest. On 2024/2/20 18:07, Quan Zongliang wrote: The Query structure has an increasing number of bool attributes. This is likely to increase in the future. And they have the same properties. Wouldn't it be better to store them in bits? Common statements don't use them, so they have little impact. This also saves memory space. Hi, Are we really adding bools to Query that often? A bit of git-blame says it's usually multiple years to add a single new flag, which is what I'd expect. I doubt that'll change. As for the memory savings, can you quantify how much memory this would save? I highly doubt that's actually true (or at least measurable). The Query struct has ~256B, the patch cuts that to ~232B. But we allocate stuff in power-of-2, so we'll allocate 256B chunk anyway. And we allocate very few of those objects anyway ... regards That makes sense. Withdraw.
Re: Change the bool member of the Query structure to bits
On 2024/2/20 23:45, Tom Lane wrote: Quan Zongliang writes: The Query structure has an increasing number of bool attributes. This is likely to increase in the future. And they have the same properties. Wouldn't it be better to store them in bits? Common statements don't use them, so they have little impact. This also saves memory space. I'm -1 on that, for three reasons: * The amount of space saved is quite negligible. If queries had many Query structs then it could matter, but they don't. * This causes enough code churn to create a headache for back-patching. * This'll completely destroy the readability of these flags in pprint output. I'm not greatly in love with the macro layer you propose, either, but those details don't matter because I think we should just leave well enough alone. regards, tom lane I get it. Withdraw.
Re: Change the bool member of the Query structure to bits
Sorry. I forgot to save a file. This is the latest. On 2024/2/20 18:07, Quan Zongliang wrote: The Query structure has an increasing number of bool attributes. This is likely to increase in the future. And they have the same properties. Wouldn't it be better to store them in bits? Common statements don't use them, so they have little impact. This also saves memory space. -- Quan Zongliangdiff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 142dcfc995..a28d625147 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3341,7 +3341,7 @@ estimate_path_cost_size(PlannerInfo *root, /* Collect statistics about aggregates for estimating costs. */ MemSet(, 0, sizeof(AggClauseCosts)); - if (root->parse->hasAggs) + if (QueryHasAggs(root->parse)) { get_agg_clause_costs(root, AGGSPLIT_SIMPLE, ); } @@ -6762,7 +6762,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, Costtotal_cost; /* Nothing to be done, if there is no grouping or aggregation required. */ - if (!parse->groupClause && !parse->groupingSets && !parse->hasAggs && + if (!parse->groupClause && !parse->groupingSets && !QueryHasAggs(parse) && !root->hasHavingQual) return; @@ -6859,7 +6859,7 @@ add_foreign_ordered_paths(PlannerInfo *root, RelOptInfo *input_rel, Assert(parse->sortClause); /* We don't support cases where there are any SRFs in the targetlist */ - if (parse->hasTargetSRFs) + if (QueryHasTargetSRFs(parse)) return; /* Save the input_rel as outerrel in fpinfo */ @@ -7007,7 +7007,7 @@ add_foreign_final_paths(PlannerInfo *root, RelOptInfo *input_rel, return; /* We don't support cases where there are any SRFs in the targetlist */ - if (parse->hasTargetSRFs) + if (QueryHasTargetSRFs(parse)) return; /* Save the input_rel as outerrel in fpinfo */ diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c index dce898c751..e328bd9242 100644 --- a/src/backend/commands/view.c +++ b/src/backend/commands/view.c @@ -398,7 +398,7 @@ DefineView(ViewStmt *stmt, const char *queryString, * DefineQueryRewrite(), but that function will complain about a bogus ON * SELECT rule, and we'd rather the message complain about a view. */ - if (viewParse->hasModifyingCTE) + if (QueryHasModifyingCTE(viewParse)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("views must not contain data-modifying statements in WITH"))); diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 0f811fd2fc..cdc94f082d 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -488,7 +488,7 @@ init_execution_state(List *queryTree_list, /* Utility commands require no planning. */ stmt = makeNode(PlannedStmt); stmt->commandType = CMD_UTILITY; - stmt->canSetTag = queryTree->canSetTag; + stmt->canSetTag = QueryCanSetTag(queryTree); stmt->utilityStmt = queryTree->utilityStmt; stmt->stmt_location = queryTree->stmt_location; stmt->stmt_len = queryTree->stmt_len; @@ -540,7 +540,7 @@ init_execution_state(List *queryTree_list, newes->stmt = stmt; newes->qd = NULL; - if (queryTree->canSetTag) + if (QueryCanSetTag(queryTree)) lasttages = newes; preves = newes; @@ -1650,7 +1650,7 @@ check_sql_fn_retval(List *queryTreeLists, { Query *q = lfirst_node(Query, lc2); - if (q->canSetTag) + if (QueryCanSetTag(q)) { parse = q; parse_cell = lc2; @@ -1934,7 +1934,7 @@ tlist_coercion_finished: newquery = makeNode(Query); newquery->commandType = CMD_SELECT; newquery->querySource = parse->querySource; - newquery->canSetTag = true; + QuerySetFlag(newquery, CAN_SET_TAG); newquery->targetList = upper_tlist;
Change the bool member of the Query structure to bits
The Query structure has an increasing number of bool attributes. This is likely to increase in the future. And they have the same properties. Wouldn't it be better to store them in bits? Common statements don't use them, so they have little impact. This also saves memory space. -- Quan Zongliangdiff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 142dcfc995..a28d625147 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3341,7 +3341,7 @@ estimate_path_cost_size(PlannerInfo *root, /* Collect statistics about aggregates for estimating costs. */ MemSet(, 0, sizeof(AggClauseCosts)); - if (root->parse->hasAggs) + if (QueryHasAggs(root->parse)) { get_agg_clause_costs(root, AGGSPLIT_SIMPLE, ); } @@ -6762,7 +6762,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, Costtotal_cost; /* Nothing to be done, if there is no grouping or aggregation required. */ - if (!parse->groupClause && !parse->groupingSets && !parse->hasAggs && + if (!parse->groupClause && !parse->groupingSets && !QueryHasAggs(parse) && !root->hasHavingQual) return; @@ -6859,7 +6859,7 @@ add_foreign_ordered_paths(PlannerInfo *root, RelOptInfo *input_rel, Assert(parse->sortClause); /* We don't support cases where there are any SRFs in the targetlist */ - if (parse->hasTargetSRFs) + if (QueryHasTargetSRFs(parse)) return; /* Save the input_rel as outerrel in fpinfo */ @@ -7007,7 +7007,7 @@ add_foreign_final_paths(PlannerInfo *root, RelOptInfo *input_rel, return; /* We don't support cases where there are any SRFs in the targetlist */ - if (parse->hasTargetSRFs) + if (QueryHasTargetSRFs(parse)) return; /* Save the input_rel as outerrel in fpinfo */ diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c index dce898c751..e328bd9242 100644 --- a/src/backend/commands/view.c +++ b/src/backend/commands/view.c @@ -398,7 +398,7 @@ DefineView(ViewStmt *stmt, const char *queryString, * DefineQueryRewrite(), but that function will complain about a bogus ON * SELECT rule, and we'd rather the message complain about a view. */ - if (viewParse->hasModifyingCTE) + if (QueryHasModifyingCTE(viewParse)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("views must not contain data-modifying statements in WITH"))); diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 0f811fd2fc..cdc94f082d 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -488,7 +488,7 @@ init_execution_state(List *queryTree_list, /* Utility commands require no planning. */ stmt = makeNode(PlannedStmt); stmt->commandType = CMD_UTILITY; - stmt->canSetTag = queryTree->canSetTag; + stmt->canSetTag = QueryCanSetTag(queryTree); stmt->utilityStmt = queryTree->utilityStmt; stmt->stmt_location = queryTree->stmt_location; stmt->stmt_len = queryTree->stmt_len; @@ -540,7 +540,7 @@ init_execution_state(List *queryTree_list, newes->stmt = stmt; newes->qd = NULL; - if (queryTree->canSetTag) + if (QueryCanSetTag(queryTree)) lasttages = newes; preves = newes; @@ -1650,7 +1650,7 @@ check_sql_fn_retval(List *queryTreeLists, { Query *q = lfirst_node(Query, lc2); - if (q->canSetTag) + if (QueryCanSetTag(q)) { parse = q; parse_cell = lc2; @@ -1934,7 +1934,7 @@ tlist_coercion_finished: newquery = makeNode(Query); newquery->commandType = CMD_SELECT; newquery->querySource = parse->querySource; - newquery->canSetTag = true; + QuerySetFlag(newquery, CAN_SET_TAG); newquery->targetList = upper_tlist; /* We need a moderately realistic colnames list for the subquery RTE */ diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index d404fbf262..bd359fd636 100644 --- a/src/backend/optimizer/path/allpaths.c +++
Re: Improvement discussion of custom and generic plans
Add the GUC parameter. On 2024/1/30 21:25, Quan Zongliang wrote: On 2023/11/3 15:27, Quan Zongliang wrote: Hi We have one such problem. A table field has skewed data. Statistics: n_distinct | -0.4481973 most_common_vals | {5f006ca25b52ed78e457b150ee95a30c} most_common_freqs | {0.5518474} Data generation: CREATE TABLE s_user ( user_id varchar(32) NOT NULL, corp_id varchar(32), status int NOT NULL ); insert into s_user select md5('user_id ' || a), md5('corp_id ' || a), case random()<0.877675 when true then 1 else -1 end FROM generate_series(1,10031) a; insert into s_user select md5('user_id ' || a), md5('corp_id 10032'), case random()<0.877675 when true then 1 else -1 end FROM generate_series(10031,22383) a; CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id); analyze s_user; 1. First, define a PREPARE statement prepare stmt as select count(*) from s_user where status=1 and corp_id = $1; 2. Run it five times. Choose the custom plan. explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c'); Here's the plan: Aggregate (cost=639.84..639.85 rows=1 width=8) (actual time=4.653..4.654 rows=1 loops=1) Buffers: shared hit=277 -> Seq Scan on s_user (cost=0.00..612.76 rows=10830 width=0) (actual time=1.402..3.747 rows=10836 loops=1) Filter: ((status = 1) AND ((corp_id)::text = '5f006ca25b52ed78e457b150ee95a30c'::text)) Rows Removed by Filter: 11548 Buffers: shared hit=277 Planning Time: 0.100 ms Execution Time: 4.674 ms (8 rows) 3.From the sixth time. Choose generic plan. We can see that there is a huge deviation between the estimate and the actual value: Aggregate (cost=11.83..11.84 rows=1 width=8) (actual time=4.424..4.425 rows=1 loops=1) Buffers: shared hit=154 read=13 -> Bitmap Heap Scan on s_user (cost=4.30..11.82 rows=2 width=0) (actual time=0.664..3.371 rows=10836 loops=1) Recheck Cond: ((corp_id)::text = $1) Filter: (status = 1) Rows Removed by Filter: 1517 Heap Blocks: exact=154 Buffers: shared hit=154 read=13 -> Bitmap Index Scan on s_user_corp_id_idx (cost=0.00..4.30 rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1) Index Cond: ((corp_id)::text = $1) Buffers: shared read=13 Planning Time: 0.246 ms Execution Time: 4.490 ms (13 rows) This is because in the choose_custom_plan function, the generic plan is attempted after executing the custom plan five times. if (plansource->num_custom_plans < 5) return true; The generic plan uses var_eq_non_const to estimate the average selectivity. These are facts that many people already know. So a brief introduction. Our users actually use such parameter conditions in very complex PREPARE statements. Once they use the generic plan for the sixth time. The execution time will change from 5 milliseconds to 5 minutes. To improve this problem. The following approaches can be considered: 1. Determine whether data skew exists in the PREPARE statement parameter conditions based on the statistics. However, there is no way to know if the user will use the skewed parameter. 2.When comparing the cost of the generic plan with the average cost of the custom plan(function choose_custom_plan). Consider whether the maximum cost of a custom plan executed is an order of magnitude different from the cost of a generic plan. If the first five use a small selectivity condition. And after the sixth use a high selectivity condition. Problems will still arise. 3.Trace the execution time of the PREPARE statement. When an execution time is found to be much longer than the average execution time, the custom plan is forced to run. Is there any better idea? I tried to do a demo. Add a member paramid to Const. When Const is generated by Param, the Const is identified as coming from Param. Then check in var_eq_const to see if the field in the condition using this parameter is skewed. If so, choose_custom_plan returns true every time, forcing custom_plan to be used. Only conditional expressions such as var eq param or param eq var can be supported. If it makes sense. Continue to improve this patch. -- Quan Zongliang diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 8b76e98529..14b8bec6ff 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -154,6 +154,8 @@ boolenable_partition_pruning = true; bool enable_presorted_aggregate = true; bool enable_async_append = true; +double skewed_param_factor = DEFAULT_SKEWED_PARAM_FACTOR; + typedef struct { PlannerInfo *root; diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index edc25d712e..8b922c0c95 100644 --- a/
Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
A little tweak to the code. GetTopTransactionIdIfAny() != InvalidTransactionId changed to TransactionIdIsValid(GetTopTransactionIdIfAny() On 2024/1/12 08:51, jian he wrote: Hi ... with patch: src3=# explain(analyze, costs off) select 1 from pg_sleep(10); 2024-01-12 08:43:14.750 CST [5739] jian@src3/psql XID:0 LOG: duration: 10010.167 ms plan: Query Text: explain(analyze, costs off) select 1 from pg_sleep(10); Function Scan on pg_sleep (cost=0.00..0.01 rows=1 width=4) (actual time=10010.155..10010.159 rows=1 loops=1) 2024-01-12 08:43:14.750 CST [5739] jian@src3/psql XID:0 LOG: statement: explain(analyze, costs off) select 1 from pg_sleep(10); QUERY PLAN - Function Scan on pg_sleep (actual time=10010.155..10010.159 rows=1 loops=1) Planning Time: 0.115 ms Execution Time: 10010.227 ms (3 rows) This problem does exist in a statement that takes a long time to run. XID is applied only for the first change tuple. If the user want to see it in a single statement log, they have to wait until the statement has finished executing. And we don't know how long it will take until the statement ends. It is not appropriate to output the log twice because of xid. Besides, without parsing log_line_prefix we don't know if the user wants to see xid.diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 1a34bd3715..bd08b64450 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -1064,8 +1064,9 @@ exec_simple_query(const char *query_string) */ parsetree_list = pg_parse_query(query_string); - /* Log immediately if dictated by log_statement */ - if (check_log_statement(parsetree_list)) + /* Log immediately if dictated by log_statement and XID assigned. */ + if (TransactionIdIsValid(GetTopTransactionIdIfAny()) && + check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), @@ -1282,6 +1283,16 @@ exec_simple_query(const char *query_string) PortalDrop(portal, false); + /* Log if dictated by log_statement and has not been logged. */ + if (!was_logged && check_log_statement(parsetree_list)) + { + ereport(LOG, + (errmsg("statement: %s", query_string), + errhidestmt(true), + errdetail_execute(parsetree_list))); + was_logged = true; + } + if (lnext(parsetree_list, parsetree_item) == NULL) { /*
Re: Improvement discussion of custom and generic plans
On 2023/11/3 15:27, Quan Zongliang wrote: Hi We have one such problem. A table field has skewed data. Statistics: n_distinct | -0.4481973 most_common_vals | {5f006ca25b52ed78e457b150ee95a30c} most_common_freqs | {0.5518474} Data generation: CREATE TABLE s_user ( user_id varchar(32) NOT NULL, corp_id varchar(32), status int NOT NULL ); insert into s_user select md5('user_id ' || a), md5('corp_id ' || a), case random()<0.877675 when true then 1 else -1 end FROM generate_series(1,10031) a; insert into s_user select md5('user_id ' || a), md5('corp_id 10032'), case random()<0.877675 when true then 1 else -1 end FROM generate_series(10031,22383) a; CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id); analyze s_user; 1. First, define a PREPARE statement prepare stmt as select count(*) from s_user where status=1 and corp_id = $1; 2. Run it five times. Choose the custom plan. explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c'); Here's the plan: Aggregate (cost=639.84..639.85 rows=1 width=8) (actual time=4.653..4.654 rows=1 loops=1) Buffers: shared hit=277 -> Seq Scan on s_user (cost=0.00..612.76 rows=10830 width=0) (actual time=1.402..3.747 rows=10836 loops=1) Filter: ((status = 1) AND ((corp_id)::text = '5f006ca25b52ed78e457b150ee95a30c'::text)) Rows Removed by Filter: 11548 Buffers: shared hit=277 Planning Time: 0.100 ms Execution Time: 4.674 ms (8 rows) 3.From the sixth time. Choose generic plan. We can see that there is a huge deviation between the estimate and the actual value: Aggregate (cost=11.83..11.84 rows=1 width=8) (actual time=4.424..4.425 rows=1 loops=1) Buffers: shared hit=154 read=13 -> Bitmap Heap Scan on s_user (cost=4.30..11.82 rows=2 width=0) (actual time=0.664..3.371 rows=10836 loops=1) Recheck Cond: ((corp_id)::text = $1) Filter: (status = 1) Rows Removed by Filter: 1517 Heap Blocks: exact=154 Buffers: shared hit=154 read=13 -> Bitmap Index Scan on s_user_corp_id_idx (cost=0.00..4.30 rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1) Index Cond: ((corp_id)::text = $1) Buffers: shared read=13 Planning Time: 0.246 ms Execution Time: 4.490 ms (13 rows) This is because in the choose_custom_plan function, the generic plan is attempted after executing the custom plan five times. if (plansource->num_custom_plans < 5) return true; The generic plan uses var_eq_non_const to estimate the average selectivity. These are facts that many people already know. So a brief introduction. Our users actually use such parameter conditions in very complex PREPARE statements. Once they use the generic plan for the sixth time. The execution time will change from 5 milliseconds to 5 minutes. To improve this problem. The following approaches can be considered: 1. Determine whether data skew exists in the PREPARE statement parameter conditions based on the statistics. However, there is no way to know if the user will use the skewed parameter. 2.When comparing the cost of the generic plan with the average cost of the custom plan(function choose_custom_plan). Consider whether the maximum cost of a custom plan executed is an order of magnitude different from the cost of a generic plan. If the first five use a small selectivity condition. And after the sixth use a high selectivity condition. Problems will still arise. 3.Trace the execution time of the PREPARE statement. When an execution time is found to be much longer than the average execution time, the custom plan is forced to run. Is there any better idea? I tried to do a demo. Add a member paramid to Const. When Const is generated by Param, the Const is identified as coming from Param. Then check in var_eq_const to see if the field in the condition using this parameter is skewed. If so, choose_custom_plan returns true every time, forcing custom_plan to be used. Only conditional expressions such as var eq param or param eq var can be supported. If it makes sense. Continue to improve this patch. -- Quan Zongliang diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 94eb56a1e7..3384520dc1 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2489,6 +2489,8 @@ eval_const_expressions_mutator(Node *node, pval, prm->isnull, typByVal); + if (paramLI->paramFetch == NULL) +
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
On 2023/11/24 03:39, Pavel Stehule wrote: I modified the documentation a little bit - we don't need to extra propose SQL array syntax, I think. I rewrote regress tests - we don't need to test unsupported functionality (related to RECORD). - all tests passed I wrote two examples of errors: user_id users.user_id%ROWTYPE[]; user_id users.user_id%ROWTYPE ARRAY[4][3]; Fixed. Regards Pavel > > Regards > > Pavel diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 5977534a62..0eedf4a55d 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -766,6 +766,40 @@ SELECT merge_fields(t.*) FROM table1 t WHERE ... ; + + Arrays of Copying Types and Row Types + + +name variable%TYPE[]; +name table_name%ROWTYPE[]; + + + +Arrays of Copying Types and Row Types is defined by appending square brackets +([]) to the %TYPE or %ROWTYPE. +Its definition is similar to PostgreSQL's arrays described in . +For example: + +user_id users.user_id%TYPE[]; +user_id users%ROWTYPE[]; + +The syntax allows the exact size of arrays to be specified. However, the current +implementation ignores any supplied array size limits, i.e., the behavior is the +same as for arrays of unspecified length. + + + +An alternative syntax, which conforms to the SQL standard by using the keyword +ARRAY, can be used for one-dimensional or multi-dimensional +arrays too: + +user_id users.user_id%TYPE ARRAY; +user_id users%ROWTYPE ARRAY[4][3]; + +As before, however, PostgreSQL does not enforce the size restriction in any case. + + + Record Types @@ -794,6 +828,12 @@ SELECT merge_fields(t.*) FROM table1 t WHERE ... ; calling query is parsed, whereas a record variable can change its row structure on-the-fly. + + +The RECORD cannot be used for declaration of variable +of an array type. "Copying Types" as shown in +are not supported too. + diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index a341cde2c1..a9cb15df6d 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -2095,6 +2095,29 @@ plpgsql_build_datatype(Oid typeOid, int32 typmod, return typ; } +/* + * Returns an array for type specified as argument. + */ +PLpgSQL_type * +plpgsql_datatype_arrayof(PLpgSQL_type *dtype) +{ + Oid array_typeid; + + if (dtype->typisarray) + return dtype; + + array_typeid = get_array_type(dtype->typoid); + + if (!OidIsValid(array_typeid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg("could not find array type for data type \"%s\"", + format_type_be(dtype->typoid; + + return plpgsql_build_datatype(array_typeid, dtype->atttypmod, + dtype->collation, NULL); +} + /* * Utility subroutine to make a PLpgSQL_type struct given a pg_type entry * and additional details (see comments for plpgsql_build_datatype). diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 6a09bfdd67..aa9103cf0e 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -2789,7 +2789,7 @@ read_datatype(int tok) StringInfoData ds; char *type_name; int startlocation; - PLpgSQL_type *result; + PLpgSQL_type *result = NULL; int parenlevel = 0; /* Should only be called while parsing DECLARE sections */ @@ -2817,15 +2817,11 @@ read_datatype(int tok) K_TYPE, "type")) { result = plpgsql_parse_wordtype(dtname); - if (result) - return result; } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) { result = plpgsql_parse_wordrowtype(dtname); - if (result) - return result; } } } @@ -2841,15 +2837,11 @@ read_datatype(int tok) K_TYPE, "type")) { result = plpgsql_parse_wordtype(dtname); - if (result) - return result; } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype"))
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
On 2023/11/20 17:33, Pavel Stehule wrote: I did some deeper check: - I don't like too much parser's modification (I am sending alternative own implementation) - the SQL parser allows richer syntax, and for full functionality is only few lines more Agree. - original patch doesn't solve %ROWTYPE (2023-11-20 10:04:36) postgres=# select * from foo; ┌┬┐ │ a │ b │ ╞╪╡ │ 10 │ 20 │ │ 30 │ 40 │ └┴┘ (2 rows) (2023-11-20 10:08:29) postgres=# do $$ declare v foo%rowtype[]; begin v := array(select row(a,b) from foo); raise notice '%', v; end; $$; NOTICE: {"(10,20)","(30,40)"} DO two little fixes 1. spelling mistake ARRAY [ icons ] --> ARRAY [ iconst ] 2. code bug if (!OidIsValid(dtype->typoid)) --> if (!OidIsValid(array_typeid)) - original patch doesn't solve type RECORD the error message should be more intuitive, although the arrays of record type can be supported, but it probably needs bigger research. (2023-11-20 10:10:34) postgres=# do $$ declare r record; v r%type[]; begin v := array(select row(a,b) from foo); raise notice '%', v; end; $$; ERROR: syntax error at or near "%" LINE 2: declare r record; v r%type[]; ^ CONTEXT: invalid type name "r%type[]" Currently only scalar variables are supported. This error is consistent with the r%type error. And record arrays are not currently supported. Support for r%type should be considered first. For now, let r%type[] report the same error as record[]. I prefer to implement it with a new patch. - missing documentation My English is not good. I wrote it down, please correct it. Add a note in the "Record Types" documentation that arrays and "Copying Types" are not supported yet. - I don't like using the word "partitioned" in the regress test name "partitioned_table". It is confusing fixed Regards Paveldiff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index 5977534a62..ad02c9f561 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -766,6 +766,30 @@ SELECT merge_fields(t.*) FROM table1 t WHERE ... ; + + Arrays of Copying Types and Row Types + + +name variable%TYPE[]; +name table_name%ROWTYPE[]; + + + +Arrays of Copying Types and Row Types is defined by appending square brackets +([]) to the %TYPE or %ROWTYPE. +Its definition is similar to PostgreSQL's array types. It is possible to +specify the exact size of the array. The keyword ARRAY can also be used. +For example: + +user_id users.user_id%TYPE[4][2]; +user_id users.user_id%ROWTYPE ARRAY[4][]; + +However, the current implementation ignores any supplied array size limits, i.e., +the behavior is the same as for arrays of unspecified length. +The current implementation does not enforce the declared number of dimensions either. + + + Record Types @@ -794,6 +818,11 @@ SELECT merge_fields(t.*) FROM table1 t WHERE ... ; calling query is parsed, whereas a record variable can change its row structure on-the-fly. + + +RECORD does not support being defined as an array. +"Copying Types" as shown in is also not supported. + diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index a341cde2c1..a9cb15df6d 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -2095,6 +2095,29 @@ plpgsql_build_datatype(Oid typeOid, int32 typmod, return typ; } +/* + * Returns an array for type specified as argument. + */ +PLpgSQL_type * +plpgsql_datatype_arrayof(PLpgSQL_type *dtype) +{ + Oid array_typeid; + + if (dtype->typisarray) + return dtype; + + array_typeid = get_array_type(dtype->typoid); + + if (!OidIsValid(array_typeid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg("could not find array type for data type \"%s\"", + format_type_be(dtype->typoid; + + return plpgsql_build_datatype(array_typeid, dtype->atttypmod, + dtype->collation, NULL); +} + /* * Utility subroutine to make a PLpgSQL_type struct given a pg_type entry * and additional details (see comments for plpgsql_build_datatype). diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 6a09bfdd67..7778bffefc 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -2789,7 +2789,7 @@ read_datatype(int tok) StringInfoData ds; char *type_name; int startlocation; - PLpgSQL_type *result; + PLpgSQL_type *result = NULL; int parenlevel = 0; /* Should only be called while parsing DECLARE sections */ @@ -2817,15 +2817,11 @@ read_datatype(int tok)
Improvement discussion of custom and generic plans
Hi We have one such problem. A table field has skewed data. Statistics: n_distinct | -0.4481973 most_common_vals | {5f006ca25b52ed78e457b150ee95a30c} most_common_freqs | {0.5518474} Data generation: CREATE TABLE s_user ( user_id varchar(32) NOT NULL, corp_id varchar(32), status int NOT NULL ); insert into s_user select md5('user_id ' || a), md5('corp_id ' || a), case random()<0.877675 when true then 1 else -1 end FROM generate_series(1,10031) a; insert into s_user select md5('user_id ' || a), md5('corp_id 10032'), case random()<0.877675 when true then 1 else -1 end FROM generate_series(10031,22383) a; CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id); analyze s_user; 1. First, define a PREPARE statement prepare stmt as select count(*) from s_user where status=1 and corp_id = $1; 2. Run it five times. Choose the custom plan. explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c'); Here's the plan: Aggregate (cost=639.84..639.85 rows=1 width=8) (actual time=4.653..4.654 rows=1 loops=1) Buffers: shared hit=277 -> Seq Scan on s_user (cost=0.00..612.76 rows=10830 width=0) (actual time=1.402..3.747 rows=10836 loops=1) Filter: ((status = 1) AND ((corp_id)::text = '5f006ca25b52ed78e457b150ee95a30c'::text)) Rows Removed by Filter: 11548 Buffers: shared hit=277 Planning Time: 0.100 ms Execution Time: 4.674 ms (8 rows) 3.From the sixth time. Choose generic plan. We can see that there is a huge deviation between the estimate and the actual value: Aggregate (cost=11.83..11.84 rows=1 width=8) (actual time=4.424..4.425 rows=1 loops=1) Buffers: shared hit=154 read=13 -> Bitmap Heap Scan on s_user (cost=4.30..11.82 rows=2 width=0) (actual time=0.664..3.371 rows=10836 loops=1) Recheck Cond: ((corp_id)::text = $1) Filter: (status = 1) Rows Removed by Filter: 1517 Heap Blocks: exact=154 Buffers: shared hit=154 read=13 -> Bitmap Index Scan on s_user_corp_id_idx (cost=0.00..4.30 rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1) Index Cond: ((corp_id)::text = $1) Buffers: shared read=13 Planning Time: 0.246 ms Execution Time: 4.490 ms (13 rows) This is because in the choose_custom_plan function, the generic plan is attempted after executing the custom plan five times. if (plansource->num_custom_plans < 5) return true; The generic plan uses var_eq_non_const to estimate the average selectivity. These are facts that many people already know. So a brief introduction. Our users actually use such parameter conditions in very complex PREPARE statements. Once they use the generic plan for the sixth time. The execution time will change from 5 milliseconds to 5 minutes. To improve this problem. The following approaches can be considered: 1. Determine whether data skew exists in the PREPARE statement parameter conditions based on the statistics. However, there is no way to know if the user will use the skewed parameter. 2.When comparing the cost of the generic plan with the average cost of the custom plan(function choose_custom_plan). Consider whether the maximum cost of a custom plan executed is an order of magnitude different from the cost of a generic plan. If the first five use a small selectivity condition. And after the sixth use a high selectivity condition. Problems will still arise. 3.Trace the execution time of the PREPARE statement. When an execution time is found to be much longer than the average execution time, the custom plan is forced to run. Is there any better idea? -- Quan Zongliang
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
On 2023/10/17 12:15, Pavel Stehule wrote: út 17. 10. 2023 v 3:30 odesílatel Quan Zongliang <mailto:quanzongli...@yeah.net>> napsal: On 2023/10/16 20:05, Pavel Stehule wrote: > > > po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson mailto:dan...@yesql.se> > <mailto:dan...@yesql.se <mailto:dan...@yesql.se>>> napsal: > > > On 16 Oct 2023, at 12:15, Quan Zongliang mailto:quanzongli...@yeah.net> > <mailto:quanzongli...@yeah.net <mailto:quanzongli...@yeah.net>>> wrote: > > > Implement TODO item: > > PL/pgSQL > > Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[] > > Cool! While I haven't looked at the patch yet, I've wanted this > myself many > times in the past when working with large plpgsql codebases. > > > As a first step, deal only with [], such as > > xxx.yyy%TYPE[] > > xxx%TYPE[] > > > > It can be extended to support multi-dimensional and complex > syntax in the future. > > Was this omitted due to complexity of implementation or for some > other reason? > Because of complexity. > > There is no reason for describing enhancement. The size and dimensions > of postgresql arrays are dynamic, depends on the value, not on > declaration. Now, this information is ignored, and can be compatibility > break to check and enforce this info. > Yes. I don't think it's necessary. If anyone needs it, we can continue to enhance it in the future. I don't think it is possible to do it. But there is another missing functionality, if I remember well. There is no possibility to declare variables for elements of array. The way it's done now is more like laziness. Is it possible to do that? If the parser encounters %TYPE[][]. It can be parsed. Then let parse_datatype do the rest. For example, partitioned_table.a%TYPE[][100][]. Parse the type name(int4) of partitioned_table.a%TYPE and add the following [][100][]. Passing "int4[][100][]" to parse_datatype will give us the array definition we want. Isn't this code a little ugly? I propose syntax xxx.yyy%ELEMENTTYPE and xxx%ELEMENTTYPE What do you think about it? No other relational database can be found with such an implementation. But it seems like a good idea. It can bring more convenience to write stored procedure. Regards Pavel > > -- > Daniel Gustafsson > > >
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
On 2023/10/16 20:05, Pavel Stehule wrote: po 16. 10. 2023 v 13:56 odesílatel Daniel Gustafsson <mailto:dan...@yesql.se>> napsal: > On 16 Oct 2023, at 12:15, Quan Zongliang mailto:quanzongli...@yeah.net>> wrote: > Implement TODO item: > PL/pgSQL > Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[] Cool! While I haven't looked at the patch yet, I've wanted this myself many times in the past when working with large plpgsql codebases. > As a first step, deal only with [], such as > xxx.yyy%TYPE[] > xxx%TYPE[] > > It can be extended to support multi-dimensional and complex syntax in the future. Was this omitted due to complexity of implementation or for some other reason? Because of complexity. There is no reason for describing enhancement. The size and dimensions of postgresql arrays are dynamic, depends on the value, not on declaration. Now, this information is ignored, and can be compatibility break to check and enforce this info. Yes. I don't think it's necessary. If anyone needs it, we can continue to enhance it in the future. -- Daniel Gustafsson
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
Error messages still seem ambiguous. do not support multi-dimensional arrays in PL/pgSQL Isn't that better? do not support multi-dimensional %TYPE arrays in PL/pgSQL On 2023/10/17 09:19, Quan Zongliang wrote: Attached new patch More explicit error messages based on type. On 2023/10/16 18:15, Quan Zongliang wrote: Implement TODO item: PL/pgSQL Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[] As a first step, deal only with [], such as xxx.yyy%TYPE[] xxx%TYPE[] It can be extended to support multi-dimensional and complex syntax in the future. -- Quan Zongliang
Re: PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
Attached new patch More explicit error messages based on type. On 2023/10/16 18:15, Quan Zongliang wrote: Implement TODO item: PL/pgSQL Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[] As a first step, deal only with [], such as xxx.yyy%TYPE[] xxx%TYPE[] It can be extended to support multi-dimensional and complex syntax in the future. -- Quan Zongliangdiff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 6a09bfdd67..c6377dcf2c 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -23,6 +23,7 @@ #include "parser/scanner.h" #include "parser/scansup.h" #include "utils/builtins.h" +#include "utils/syscache.h" #include "plpgsql.h" @@ -76,6 +77,7 @@ staticPLpgSQL_expr*read_sql_expression2(int until, int until2, int *endtoken); static PLpgSQL_expr*read_sql_stmt(void); static PLpgSQL_type*read_datatype(int tok); +static PLpgSQL_type*read_datatype_array(PLpgSQL_type *elem_typ); static PLpgSQL_stmt*make_execsql_stmt(int firsttoken, int location); static PLpgSQL_stmt_fetch *read_fetch_direction(void); static void complete_direction(PLpgSQL_stmt_fetch *fetch, @@ -2783,6 +2785,74 @@ read_sql_construct(int until, return expr; } +static PLpgSQL_type * +read_datatype_array(PLpgSQL_type *elem_typ) +{ + int tok; + HeapTuple type_tup = NULL; + Form_pg_typetype_frm; + + Assert(elem_typ); + if (!OidIsValid(elem_typ->typoid)) + return elem_typ; + + tok = yylex(); + /* Next token is not square bracket. */ + if (tok != '[') + { + plpgsql_push_back_token(tok); + + return elem_typ; + } + + tok = yylex(); + /* For now, deal only with []. */ + if (tok != ']') + { + plpgsql_push_back_token(tok); + plpgsql_push_back_token('['); + + return elem_typ; + } + + type_tup = SearchSysCache1(TYPEOID, + ObjectIdGetDatum(elem_typ->typoid)); + + /* should not happen. */ + if (!HeapTupleIsValid(type_tup)) + elog(ERROR, "cache lookup failed for type %u", elem_typ->typoid); + + type_frm = (Form_pg_type) GETSTRUCT(type_tup); + + if (OidIsValid(type_frm->typarray)) + { + Oid arrtyp_oid = type_frm->typarray; + + ReleaseSysCache(type_tup); + + return plpgsql_build_datatype(arrtyp_oid, + elem_typ->atttypmod, + elem_typ->collation, + NULL); + } + + if (type_frm->typcategory != TYPCATEGORY_ARRAY) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("type \"%s[]\" does not exist", + NameStr(type_frm->typname)), + parser_errposition(yylloc))); + + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("do not support multi-dimensional arrays in PL/pgSQL"), + parser_errposition(yylloc))); + + ReleaseSysCache(type_tup); + + return elem_typ; +} + static PLpgSQL_type * read_datatype(int tok) { @@ -2818,7 +2888,9 @@ read_datatype(int tok) { result = plpgsql_parse_wordtype(dtname); if (result) - return result; + { + return read_datatype_array(result); + } } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) @@ -2842,7 +2914,9 @@ read_datatype(int tok) { result = plpgsql_parse_wordtype(dtname); if (result) - return result; + { + return read_datatype_array(result); + } } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) @@ -2866,7 +2940,9 @@ read_datatype(int tok)
PL/pgSQL: Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[]
Implement TODO item: PL/pgSQL Incomplete item Allow handling of %TYPE arrays, e.g. tab.col%TYPE[] As a first step, deal only with [], such as xxx.yyy%TYPE[] xxx%TYPE[] It can be extended to support multi-dimensional and complex syntax in the future. -- Quan Zongliangdiff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 6a09bfdd67..3ff12b7af9 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -23,6 +23,7 @@ #include "parser/scanner.h" #include "parser/scansup.h" #include "utils/builtins.h" +#include "utils/syscache.h" #include "plpgsql.h" @@ -76,6 +77,7 @@ staticPLpgSQL_expr*read_sql_expression2(int until, int until2, int *endtoken); static PLpgSQL_expr*read_sql_stmt(void); static PLpgSQL_type*read_datatype(int tok); +static PLpgSQL_type*read_datatype_array(PLpgSQL_type *elem_typ); static PLpgSQL_stmt*make_execsql_stmt(int firsttoken, int location); static PLpgSQL_stmt_fetch *read_fetch_direction(void); static void complete_direction(PLpgSQL_stmt_fetch *fetch, @@ -2783,6 +2785,55 @@ read_sql_construct(int until, return expr; } +static PLpgSQL_type * +read_datatype_array(PLpgSQL_type *elem_typ) +{ + int tok; + HeapTuple type_tup = NULL; + Form_pg_typetype_frm; + Oid arrtyp_oid; + + Assert(elem_typ); + if (!OidIsValid(elem_typ->typoid)) + return elem_typ; + + tok = yylex(); + /* Next token is not square bracket. */ + if (tok != '[') + { + plpgsql_push_back_token(tok); + + return elem_typ; + } + + tok = yylex(); + /* For now, deal only with []. */ + if (tok != ']') + { + plpgsql_push_back_token('['); + plpgsql_push_back_token(tok); + + return elem_typ; + } + + type_tup = SearchSysCache1(TYPEOID, + ObjectIdGetDatum(elem_typ->typoid)); + if (!HeapTupleIsValid(type_tup)) + return elem_typ; + + type_frm = (Form_pg_type) GETSTRUCT(type_tup); + arrtyp_oid = type_frm->typarray; + ReleaseSysCache(type_tup); + + if (OidIsValid(arrtyp_oid)) + return plpgsql_build_datatype(arrtyp_oid, + elem_typ->atttypmod, + elem_typ->collation, + NULL); + else + return elem_typ; +} + static PLpgSQL_type * read_datatype(int tok) { @@ -2818,7 +2869,9 @@ read_datatype(int tok) { result = plpgsql_parse_wordtype(dtname); if (result) - return result; + { + return read_datatype_array(result); + } } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) @@ -2842,7 +2895,9 @@ read_datatype(int tok) { result = plpgsql_parse_wordtype(dtname); if (result) - return result; + { + return read_datatype_array(result); + } } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) @@ -2866,7 +2921,9 @@ read_datatype(int tok) { result = plpgsql_parse_cwordtype(dtnames); if (result) - return result; + { + return read_datatype_array(result); + } } else if (tok_is_keyword(tok, , K_ROWTYPE, "rowtype")) diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out index 272f5d2111..8db28c1122 100644 --- a/src/test/regress/expected/plpgsql.out +++ b/src/test/regress/expected/plpgsql.out @@ -5814,6 +5814,31 @@ SELECT * FROM list_partitioned_table() AS t; 2 (2 rows) +CREATE OR REPLACE FUNCTION array_partitioned_table() +RETURNS SETOF partitioned_table.a%TYPE AS $$ +DECLARE +i int; +row partitioned_table%ROWTYPE; +a_val
Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
Implement TODO item: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block Currently it displays zero. Check that the XID has been assigned at the location where the statement log is now printed. If not, no statement log is output. And then before finish_xact_command. If the statement has not been output to the log. Here the log can get XID. DML that does not manipulate any data still does not get XID. [32718][788] LOG: statement: insert into t1 values(1,0,''); [32718][789] LOG: statement: delete from t1; [32718][0] LOG: statement: delete from t1; -- Quan Zongliangdiff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 21b9763183..b73676ab9d 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -1063,8 +1063,9 @@ exec_simple_query(const char *query_string) */ parsetree_list = pg_parse_query(query_string); - /* Log immediately if dictated by log_statement */ - if (check_log_statement(parsetree_list)) + /* Log immediately if dictated by log_statement and XID assigned. */ + if (GetTopTransactionIdIfAny() != InvalidTransactionId && + check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), @@ -1281,6 +1282,16 @@ exec_simple_query(const char *query_string) PortalDrop(portal, false); + /* Log if dictated by log_statement and has not been logged. */ + if (!was_logged && check_log_statement(parsetree_list)) + { + ereport(LOG, + (errmsg("statement: %s", query_string), + errhidestmt(true), + errdetail_execute(parsetree_list))); + was_logged = true; + } + if (lnext(parsetree_list, parsetree_item) == NULL) { /*
Re: Improving the heapgetpage function improves performance in common scenarios
On 2023/9/6 17:07, John Naylor wrote: On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang <mailto:quanzongli...@yeah.net>> wrote: > If not optimized(--enable-debug CFLAGS='-O0'), there is a clear > difference. When the compiler does the optimization, the performance is > similar. I think the compiler does a good enough optimization with > "pg_attribute_always_inline" and the last two constant parameters when > calling heapgetpage_collect. So as we might expect, more specialization (Andres' patch) has no apparent downsides in this workload. (While I'm not sure of the point of testing at -O0, I think we can conclude that less-bright compilers will show some improvement with either patch.) If you agree, do you want to withdraw your patch from the commit fest? Ok. -- John Naylor EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Re: Improving the heapgetpage function improves performance in common scenarios
On 2023/9/6 15:50, Quan Zongliang wrote: On 2023/9/5 18:46, John Naylor wrote: On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang <mailto:quanzongli...@yeah.net>> wrote: > Here's how I test it > EXPLAIN ANALYZE SELECT * FROM orders; Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good for these kinds of tests. > I'll also try Andres Freund's test method next. Commit f691f5b80a85 from today removes another source of overhead in this function, so I suggest testing against that, if you wish to test again. Test with the latest code of the master branch, see the attached results. If not optimized(--enable-debug CFLAGS='-O0'), there is a clear difference. When the compiler does the optimization, the performance is similar. I think the compiler does a good enough optimization with "pg_attribute_always_inline" and the last two constant parameters when calling heapgetpage_collect. Add a note. The first execution time of an attachment is not calculated in the average. -- John Naylor EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Re: Improving the heapgetpage function improves performance in common scenarios
On 2023/9/5 18:46, John Naylor wrote: On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang <mailto:quanzongli...@yeah.net>> wrote: > Here's how I test it > EXPLAIN ANALYZE SELECT * FROM orders; Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good for these kinds of tests. > I'll also try Andres Freund's test method next. Commit f691f5b80a85 from today removes another source of overhead in this function, so I suggest testing against that, if you wish to test again. Test with the latest code of the master branch, see the attached results. If not optimized(--enable-debug CFLAGS='-O0'), there is a clear difference. When the compiler does the optimization, the performance is similar. I think the compiler does a good enough optimization with "pg_attribute_always_inline" and the last two constant parameters when calling heapgetpage_collect. -- John Naylor EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Re: Improving the heapgetpage function improves performance in common scenarios
On 2023/9/5 16:15, John Naylor wrote: On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang <mailto:quanzongli...@yeah.net>> wrote: > In the function heapgetpage. If a table is not updated very frequently. > Many actions in tuple loops are superfluous. For all_visible pages, > loctup does not need to be assigned, nor does the "valid" variable. > CheckForSerializableConflictOutNeeded from > HeapCheckForSerializableConflictOut function, it only need to inspect at Thanks for submitting! A few weeks before this, there was another proposal, which specializes code for all paths, not just one. That patch also does so without duplicating the loop: https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de <https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de> Nice patch. I'm sorry I didn't notice it before. > the beginning of the cycle only once. Using vtune you can clearly see > the result (attached heapgetpage.jpg). > > So by splitting the loop logic into two parts, the vtune results show > significant improvement (attached heapgetpage-allvis.jpg). For future reference, it's not clear at all from the screenshots what the improvement will be for the user. In the above thread, the author shares testing methodology as well as timing measurements. This is useful for reproducibilty, as well as convincing others that the change is important. Here's how I test it EXPLAIN ANALYZE SELECT * FROM orders; Maybe the test wasn't good enough. Although the modified optimal result looks good. Because it fluctuates a lot. It's hard to compare. The results of vtune are therefore used. My patch is mainly to eliminate: 1, Assignment of "loctup" struct variable (in vtune you can see that these 4 lines have a significant overhead: 0.4 1.0 0.2 0.4). 2. Assignment of the "valid" variable.(overhead 0.6) 3. HeapCheckForSerializableConflictOut function call.(overhead 0.6) Although these are not the same overhead from test to test. But all are too obvious to ignore. The screenshots are mainly to show the three improvements mentioned above. I'll also try Andres Freund's test method next. -- John Naylor EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
On 2023/6/17 06:46, Tom Lane wrote: Quan Zongliang writes: Perhaps we should discard this (dups cnt > 1) restriction? That's not going to happen on the basis of one test case that you haven't even shown us. The implications of doing it are very unclear. In particular, I seem to recall that there are bits of logic that depend on the assumption that MCV entries always represent more than one row. The nmultiple calculation Tomas referred to may be failing because of that, but I'm worried about there being other places. The statistics for the other table look like this: stadistinct | 6 stanumbers1 | {0.50096667,0.49736667,0.0012} stavalues1 | {v22,v23,v5} The value that appears twice in the small table (v1 and v2) does not appear here. The stadistinct's true value is 18 instead of 6 (three values in the small table do not appear here). When calculating the selectivity: if (nd2 > sslot2->nvalues) totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - sslot2->nvalues); totalsel1 = 0 nd2 = 21 sslot2->nvalues = 2 unmatchfreq1 = 0.0002016420476 otherfreq2 = 0.82608695328235626 result: totalsel1 = 0.043473913749706022 rows = 0.043473913749706022 * 23 * 2,000,000 = 1999800 Basically, you're proposing a rather fundamental change in the rules by which Postgres has gathered statistics for decades. You need to bring some pretty substantial evidence to support that. The burden of proof is on you, not on the status quo. regards, tom lane
Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
On 2023/6/16 23:39, Tomas Vondra wrote: On 6/16/23 11:25, Quan Zongliang wrote: We have a small table with only 23 rows and 21 values. The resulting MCV and histogram is as follows stanumbers1 | {0.08695652,0.08695652} stavalues1 | {v1,v2} stavalues2 | {v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21} An incorrect number of rows was estimated when HashJoin was done with another large table (about 2 million rows). Hash Join (cost=1.52..92414.61 rows=2035023 width=0) (actual time=1.943..1528.983 rows=3902 loops=1) That's interesting. I wonder how come the estimate gets this bad simply by skipping values entries with a single row in the sample, which means we know the per-value selectivity pretty well. I guess the explanation has to be something strange happening when estimating the join condition selectivity, where we combine MCVs from both sides of the join (which has to be happening here, otherwise it would not matter what gets to the MCV). It'd be interesting to know what's in the other MCV, and what are the other statistics for the attributes (ndistinct etc.). Or even better, a reproducer SQL script that builds two tables and then joins them. The other table is severely skewed. Most rows cannot JOIN the small table. This special case causes the inaccuracy of cost calculation. The reason is that the MCV of the small table excludes values with rows of 1. Put them in the MCV in the statistics to get the correct result. Using the conservative samplerows <= attstattarget doesn't completely solve this problem. It can solve this case. After modification we get statistics without histogram: stanumbers1 | {0.08695652,0.08695652,0.04347826,0.04347826, ... } stavalues1 | {v,v2, ... } And we have the right estimates: Hash Join (cost=1.52..72100.69 rows=3631 width=0) (actual time=1.447..1268.385 rows=3902 loops=1) I'm not against building a "complete" MCV, but I guess the case where (samplerows <= num_mcv) is pretty rare. Why shouldn't we make the MCV complete whenever we decide (ndistinct <= num_mcv)? That would need to happen later, because we don't have the ndistinct estimate yet at this point - we'd have to do the loop a bit later (or likely twice). FWIW the patch breaks the calculation of nmultiple (and thus likely the ndistinct estimate). It's not just a small table. If a column's value is nearly unique. It also causes the same problem because we exclude values that occur only once. samplerows <= num_mcv just solves one scenario. Perhaps we should discard this (dups cnt > 1) restriction? regards
Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
We have a small table with only 23 rows and 21 values. The resulting MCV and histogram is as follows stanumbers1 | {0.08695652,0.08695652} stavalues1 | {v1,v2} stavalues2 | {v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13,v14,v15,v16,v17,v18,v19,v20,v21} An incorrect number of rows was estimated when HashJoin was done with another large table (about 2 million rows). Hash Join (cost=1.52..92414.61 rows=2035023 width=0) (actual time=1.943..1528.983 rows=3902 loops=1) The reason is that the MCV of the small table excludes values with rows of 1. Put them in the MCV in the statistics to get the correct result. Using the conservative samplerows <= attstattarget doesn't completely solve this problem. It can solve this case. After modification we get statistics without histogram: stanumbers1 | {0.08695652,0.08695652,0.04347826,0.04347826, ... } stavalues1 | {v,v2, ... } And we have the right estimates: Hash Join (cost=1.52..72100.69 rows=3631 width=0) (actual time=1.447..1268.385 rows=3902 loops=1) Regards, -- Quan Zongliang Beijing Vastdata Co., LTDdiff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 52ef462dba..08ea4243f5 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -2518,7 +2518,7 @@ compute_scalar_stats(VacAttrStatsP stats, { /* Reached end of duplicates of this value */ ndistinct++; - if (dups_cnt > 1) + if (dups_cnt > 1 || samplerows <= num_mcv) { nmultiple++; if (track_cnt < num_mcv ||
Re: Why enable_hashjoin Completely disables HashJoin
On 2023/4/3 19:44, Tomas Vondra wrote: On 4/3/23 12:23, Quan Zongliang wrote: Hi, I found that the enable_hashjoin disables HashJoin completely. It's in the function add_paths_to_joinrel: if (enable_hashjoin || jointype == JOIN_FULL) hash_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, ); Instead, it should add a disable cost to the cost calculation of hashjoin. And now final_cost_hashjoin does the same thing: if (!enable_hashjoin) startup_cost += disable_cost; enable_mergejoin has the same problem. Test case: CREATE TABLE t_score_01( s_id int, s_score int, s_course char(8), c_id int); CREATE TABLE t_student_01( s_id int, s_name char(8)); insert into t_score_01 values( generate_series(1, 100), random()*100, 'course', generate_series(1, 100)); insert into t_student_01 values(generate_series(1, 100), 'name'); analyze t_score_01; analyze t_student_01; SET enable_hashjoin TO off; SET enable_nestloop TO off; SET enable_mergejoin TO off; explain select count(*) from t_student_01 a join t_score_01 b on a.s_id=b.s_id; After disabling all three, the HashJoin path should still be chosen. It's not clear to me why that behavior would be desirable? Why is this an issue you need so solve? Because someone noticed that when he set enable_hashjoin, enable_mergejoin and enable_nestloop to off. The statement seemed to get stuck (actually because it chose the NestedLoop path, which took a long long time to run). If enable_hashjoin and enable_nestloop disable generating these two paths. Then enable_nestloop should do the same thing, but it doesn't. AFAIK the reason why some paths are actually disabled (not built at all) while others are only penalized by adding disable_cost is that we need to end up with at least one way to execute the query. So we pick a path that we know is possible (e.g. seqscan) and hard-disable other paths. But the always-possible path is only soft-disabled by disable_cost. For joins, we do the same thing. The hash/merge joins may not be possible, because the data types may not have hash/sort operators, etc. Nestloop is always possible. So we soft-disable nestloop but hard-disable hash/merge joins. I doubt we want to change this behavior, unless there's a good reason to do that ... It doesn't have to change. Because selecting NestedLoop doesn't really get stuck either. It just takes too long to run. I will change the patch status to Withdrawn. regards
Why enable_hashjoin Completely disables HashJoin
Hi, I found that the enable_hashjoin disables HashJoin completely. It's in the function add_paths_to_joinrel: if (enable_hashjoin || jointype == JOIN_FULL) hash_inner_and_outer(root, joinrel, outerrel, innerrel, jointype, ); Instead, it should add a disable cost to the cost calculation of hashjoin. And now final_cost_hashjoin does the same thing: if (!enable_hashjoin) startup_cost += disable_cost; enable_mergejoin has the same problem. Test case: CREATE TABLE t_score_01( s_id int, s_score int, s_course char(8), c_id int); CREATE TABLE t_student_01( s_id int, s_name char(8)); insert into t_score_01 values( generate_series(1, 100), random()*100, 'course', generate_series(1, 100)); insert into t_student_01 values(generate_series(1, 100), 'name'); analyze t_score_01; analyze t_student_01; SET enable_hashjoin TO off; SET enable_nestloop TO off; SET enable_mergejoin TO off; explain select count(*) from t_student_01 a join t_score_01 b on a.s_id=b.s_id; After disabling all three, the HashJoin path should still be chosen. Attached is the patch file. -- Quan Zongliang Vastdatadiff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index bd51e4f972..4b5ab7f7ff 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -206,14 +206,13 @@ add_paths_to_joinrel(PlannerInfo *root, * way of implementing a full outer join, so override enable_mergejoin if * it's a full join. */ - if (enable_mergejoin || jointype == JOIN_FULL) - extra.mergeclause_list = select_mergejoin_clauses(root, - joinrel, - outerrel, - innerrel, - restrictlist, - jointype, - _allowed); + extra.mergeclause_list = select_mergejoin_clauses(root, + joinrel, + outerrel, + innerrel, + restrictlist, + jointype, + _allowed); /* * If it's SEMI, ANTI, or inner_unique join, compute correction factors @@ -316,9 +315,8 @@ add_paths_to_joinrel(PlannerInfo *root, * before being joined. As above, disregard enable_hashjoin for full * joins, because there may be no other alternative. */ - if (enable_hashjoin || jointype == JOIN_FULL) - hash_inner_and_outer(root, joinrel, outerrel, innerrel, -jointype, ); + hash_inner_and_outer(root, joinrel, outerrel, innerrel, +jointype, ); /* * 5. If inner and outer relations are foreign tables (or joins) belonging
Re: Bug: When user-defined AM is used, the index path cannot be selected correctly
On 2022/8/17 10:03, Tom Lane wrote: Quan Zongliang writes: 1. When using extended PGroonga ... 3. Neither ID = 'f' nor id= 't' can use the index correctly. This works fine for btree indexes. I think the actual problem is that IsBooleanOpfamily only accepts the btree and hash opclasses, and that's what needs to be improved. Your proposed patch fails to do that, which makes it just a crude hack that solves some aspects of the issue (and probably breaks other things). It might work to change IsBooleanOpfamily so that it checks to see whether BooleanEqualOperator is a member of the opclass. That's basically what we need to know before we dare generate substitute index clauses. It's kind of an expensive test though, and the existing coding assumes that IsBooleanOpfamily is cheap ... regards, tom lane New patch attached. It seems that partitions do not use AM other than btree and hash. Rewrite only indxpath.c and check if it is a custom AM.diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 7d176e7b00..30df5cb2f6 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -23,6 +23,7 @@ #include "catalog/pg_operator.h" #include "catalog/pg_opfamily.h" #include "catalog/pg_type.h" +#include "commands/defrem.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" @@ -2295,7 +2296,7 @@ match_clause_to_indexcol(PlannerInfo *root, /* First check for boolean-index cases. */ opfamily = index->opfamily[indexcol]; - if (IsBooleanOpfamily(opfamily)) + if (IsBooleanAmOpfamily(index->relam, opfamily)) { iclause = match_boolean_index_clause(root, rinfo, indexcol, index); if (iclause) diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h index 8dc9ce01bb..5c4cc616d8 100644 --- a/src/include/catalog/pg_opfamily.h +++ b/src/include/catalog/pg_opfamily.h @@ -58,6 +58,12 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_opfamily_oid_index, 2755, OpfamilyOidIndexId, on pg #define IsBooleanOpfamily(opfamily) \ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID) +#define IsBooleanAmOpfamily(amid, opfamily) \ + ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID || \ + ((amid) >= FirstNormalObjectId && \ + OidIsValid(GetDefaultOpClass(BOOLOID, (amid \ + ) + #endif /* EXPOSE_TO_CLIENT_CODE */ #endif /* PG_OPFAMILY_H */
Bug: When user-defined AM is used, the index path cannot be selected correctly
Hi, 1. When using extended PGroonga CREATE EXTENSION pgroonga; CREATE TABLE memos ( id boolean, content varchar ); CREATE INDEX idxA ON memos USING pgroonga (id); 2. Disable bitmapscan and seqscan: SET enable_seqscan=off; SET enable_indexscan=on; SET enable_bitmapscan=off; 3. Neither ID = 'f' nor id= 't' can use the index correctly. postgres=# explain select * from memos where id='f'; QUERY PLAN -- Seq Scan on memos (cost=100.00..101.06 rows=3 width=33) Filter: (NOT id) (2 rows) postgres=# explain select * from memos where id='t'; QUERY PLAN -- Seq Scan on memos (cost=100.00..101.06 rows=3 width=33) Filter: id (2 rows) postgres=# explain select * from memos where id>='t'; QUERY PLAN --- Index Scan using idxa on memos (cost=0.00..4.01 rows=2 width=33) Index Cond: (id >= true) (2 rows) The reason is that these expressions are converted to BoolExpr and Var. match_clause_to_indexcol does not use them to check boolean-index. patch attached. -- Quan Zongliang Beijing Vastdatadiff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 7d176e7b00..31229ce16c 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -2295,7 +2295,7 @@ match_clause_to_indexcol(PlannerInfo *root, /* First check for boolean-index cases. */ opfamily = index->opfamily[indexcol]; - if (IsBooleanOpfamily(opfamily)) + if (IsBooleanOpfamily(opfamily) || IsA(clause, BoolExpr) || IsA(clause, Var)) { iclause = match_boolean_index_clause(root, rinfo, indexcol, index); if (iclause)
Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
On 2021/7/9 9:50 上午, Michael Paquier wrote: On Fri, Jul 09, 2021 at 09:26:37AM +0800, Quan Zongliang wrote: new patch attached That's mostly fine at quick glance. Here are some comments. Please add pageinspect--1.9--1.10.sql within the patch. Using git, you can do that with a simple "git add". With the current shape of the patch, one has to manually copy pageinspect--1.9--1.10.sql into contrib/pageinspect/ to be able to test it. +SELECT lower, upper, special, pagesize, version FROM page_header(get_raw_page('test1', 0)); + lower | upper | special | pagesize | version +---+---+-+--+- +28 | 8152 |8192 | 8192 | 4 +(1 row) I would not test all the fields, just pagesize and version perhaps? + if (TupleDescAttr(tupdesc, 3)->atttypid == INT2OID) + values[3] = UInt16GetDatum(page->pd_lower); + else + values[3] = Int32GetDatum(page->pd_lower); Let's make the style more consistent with brinfuncs.c, by grouping all the fields together in a switch/case, like that: switch ((TupleDescAttr(tupdesc, 3)->atttypid)): { case INT2OID: /* fill in values with UInt16GetDatum() */ break; case INT4OID: /* fill in values with Int32GetDatum() */ break; default: elog(ERROR, "blah"); } +ALTER EXTENSION pageinspect UPDATE TO '1.10'; +\df page_header +SELECT lower, upper, special, pagesize, version FROM page_header(get_raw_page('test1', 0)); No need for this test as page.sql already stresses page_header() for the latest version. Perhaps we could just UPDATE TO '1.9' before running your query to show that it is the last version of pageinspect where the older page_header() appeared. The documentation of pageinspect does not require an update, as far as I can see. So we are good there. -- Michael Thanks for the comments. Done diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index 2d330ddb28..068134 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -13,7 +13,8 @@ OBJS = \ rawpage.o EXTENSION = pageinspect -DATA = pageinspect--1.8--1.9.sql \ +DATA = pageinspect--1.9--1.10.sql \ + pageinspect--1.8--1.9.sql \ pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \ pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ diff --git a/contrib/pageinspect/expected/oldextversions.out b/contrib/pageinspect/expected/oldextversions.out index 04dc7f8640..1bb3919504 100644 --- a/contrib/pageinspect/expected/oldextversions.out +++ b/contrib/pageinspect/expected/oldextversions.out @@ -36,5 +36,19 @@ SELECT * FROM bt_page_items('test1_a_idx', 1); 1 | (0,1) | 16 | f | f| 01 00 00 00 00 00 00 01 | f | (0,1) | (1 row) +ALTER EXTENSION pageinspect UPDATE TO '1.9'; +\df page_header + List of functions + Schema |Name | Result data type | Argument data types | Type ++-+--+-+-- + public | page_header | record | page bytea, OUT lsn pg_lsn, OUT checksum smallint, OUT flags smallint, OUT lower smallint, OUT upper smallint, OUT special smallint, OUT pagesize smallint, OUT version smallint, OUT prune_xid xid | func +(1 row) + +SELECT pagesize, version FROM page_header(get_raw_page('test1', 0)); + pagesize | version +--+- + 8192 | 4 +(1 row) + DROP TABLE test1; DROP EXTENSION pageinspect; diff --git a/contrib/pageinspect/pageinspect--1.9--1.10.sql b/contrib/pageinspect/pageinspect--1.9--1.10.sql new file mode 100644 index 00..8dd9a27505 --- /dev/null +++ b/contrib/pageinspect/pageinspect--1.9--1.10.sql @@ -0,0 +1,21 @@ +/* contrib/pageinspect/pageinspect--1.9--1.10.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pageinspect UPDATE TO '1.10'" to load this file. \quit + +-- +-- page_header() +-- +DROP FUNCTION page_header(IN page bytea); +CREATE FUNCTION page_header(IN page bytea, +OUT lsn pg_lsn, +OUT checksum smallint, +OUT flags smallint, +OUT lower int, +OUT upper int, +OUT special int, +OUT pagesize int, +OUT version smallint, +OUT prune_xid xid) +AS 'MODULE_PATHNAME', 'page_header' +LANGUAGE C STRICT PARALLEL SAFE; diff --git a/contrib/pageinspect/pageinspect.control b/contrib/pageinspect/pageinspect.control index bd716769a1..7
Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
On 2021/7/8 3:54 下午, Michael Paquier wrote: On Wed, Jul 07, 2021 at 11:28:06PM -0500, Justin Pryzby wrote: I think you can refer to this prior commit for guidance. commit f18aa1b203930ed28cfe42e82d3418ae6277576d Author: Peter Eisentraut Date: Tue Jan 19 10:28:05 2021 +0100 pageinspect: Change block number arguments to bigint Yes, thanks. Peter's recent work is what I had in mind. I agree that we could improve the situation, and the change is not complicated once you know what needs to be done. It needs to be done as follows: - Create a new pageinspect--1.9--1.10.sql. - Provide a proper implementation for older extension version based on the output parameter types, with a lookup at the TupleDesc for the function to adapt. - Add tests to sql/oldextversions.sql to stress the old function based on smallint results. - Bump pageinspect.control. Quan, would you like to produce a patch? That's a good exercise to understand how the maintenance of extensions is done. new patch attached -- Michael diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index 2d330ddb28..068134 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -13,7 +13,8 @@ OBJS = \ rawpage.o EXTENSION = pageinspect -DATA = pageinspect--1.8--1.9.sql \ +DATA = pageinspect--1.9--1.10.sql \ + pageinspect--1.8--1.9.sql \ pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \ pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ diff --git a/contrib/pageinspect/expected/oldextversions.out b/contrib/pageinspect/expected/oldextversions.out index 04dc7f8640..9546791e23 100644 --- a/contrib/pageinspect/expected/oldextversions.out +++ b/contrib/pageinspect/expected/oldextversions.out @@ -36,5 +36,32 @@ SELECT * FROM bt_page_items('test1_a_idx', 1); 1 | (0,1) | 16 | f | f| 01 00 00 00 00 00 00 01 | f | (0,1) | (1 row) +\df page_header + List of functions + Schema |Name | Result data type | Argument data types | Type ++-+--+-+-- + public | page_header | record | page bytea, OUT lsn pg_lsn, OUT checksum smallint, OUT flags smallint, OUT lower smallint, OUT upper smallint, OUT special smallint, OUT pagesize smallint, OUT version smallint, OUT prune_xid xid | func +(1 row) + +SELECT lower, upper, special, pagesize, version FROM page_header(get_raw_page('test1', 0)); + lower | upper | special | pagesize | version +---+---+-+--+- +28 | 8152 |8192 | 8192 | 4 +(1 row) + +ALTER EXTENSION pageinspect UPDATE TO '1.10'; +\df page_header + List of functions + Schema |Name | Result data type | Argument data types | Type ++-+--+-+-- + public | page_header | record | page bytea, OUT lsn pg_lsn, OUT checksum smallint, OUT flags smallint, OUT lower integer, OUT upper integer, OUT special integer, OUT pagesize integer, OUT version smallint, OUT prune_xid xid | func +(1 row) + +SELECT lower, upper, special, pagesize, version FROM page_header(get_raw_page('test1', 0)); + lower | upper | special | pagesize | version +---+---+-+--+- +28 | 8152 |8192 | 8192 | 4 +(1 row) + DROP TABLE test1; DROP EXTENSION pageinspect; diff --git a/contrib/pageinspect/pageinspect.control b/contrib/pageinspect/pageinspect.control index bd716769a1..7cdf37913d 100644 --- a/contrib/pageinspect/pageinspect.control +++ b/contrib/pageinspect/pageinspect.control @@ -1,5 +1,5 @@ # pageinspect extension comment = 'inspect the contents of database pages at a low level' -default_version = '1.9' +default_version = '1.10' module_pathname = '$libdir/pageinspect' relocatable = true diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c index e9fee73bc4..2d1f90e3bc 100644 --- a/contrib/pageinspect/rawpage.c +++
Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
On 2021/7/8 3:54 下午, Michael Paquier wrote: On Wed, Jul 07, 2021 at 11:28:06PM -0500, Justin Pryzby wrote: I think you can refer to this prior commit for guidance. commit f18aa1b203930ed28cfe42e82d3418ae6277576d Author: Peter Eisentraut Date: Tue Jan 19 10:28:05 2021 +0100 pageinspect: Change block number arguments to bigint Yes, thanks. Peter's recent work is what I had in mind. I agree that we could improve the situation, and the change is not complicated once you know what needs to be done. It needs to be done as follows: - Create a new pageinspect--1.9--1.10.sql. - Provide a proper implementation for older extension version based on the output parameter types, with a lookup at the TupleDesc for the function to adapt. - Add tests to sql/oldextversions.sql to stress the old function based on smallint results. - Bump pageinspect.control. Quan, would you like to produce a patch? That's a good exercise to understand how the maintenance of extensions is done. Ok. I'll do it. -- Michael
bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
If the block size is 32k, the function page_header of the pageinspect module returns negative numbers: postgres=# select * from page_header(get_raw_page('t1',0)); lsn| checksum | flags | lower | upper | special | pagesize | version | prune_xid ---+--+---+---+---+-+--+-+--- 0/174CF58 |0 | 0 |28 | 32736 | -32768 | -32768 | 4 | 0 (1 row) This patch changes the output parameters lower, upper, special and pagesize to int32. postgres=# select * from page_header(get_raw_page('t1',0)); lsn| checksum | flags | lower | upper | special | pagesize | version | prune_xid ---+--+---+---+---+-+--+-+--- 0/19EA640 |0 | 0 |28 | 32736 | 32768 |32768 | 4 | 0 (1 row) -- Quan Zongliang diff --git a/contrib/pageinspect/pageinspect--1.5.sql b/contrib/pageinspect/pageinspect--1.5.sql index 1e40c3c97e..f71eff19c5 100644 --- a/contrib/pageinspect/pageinspect--1.5.sql +++ b/contrib/pageinspect/pageinspect--1.5.sql @@ -23,10 +23,10 @@ CREATE FUNCTION page_header(IN page bytea, OUT lsn pg_lsn, OUT checksum smallint, OUT flags smallint, -OUT lower smallint, -OUT upper smallint, -OUT special smallint, -OUT pagesize smallint, +OUT lower int, +OUT upper int, +OUT special int, +OUT pagesize int, OUT version smallint, OUT prune_xid xid) AS 'MODULE_PATHNAME', 'page_header' diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c index 7272b21016..af04c1a688 100644 --- a/contrib/pageinspect/rawpage.c +++ b/contrib/pageinspect/rawpage.c @@ -316,10 +316,10 @@ page_header(PG_FUNCTION_ARGS) values[0] = LSNGetDatum(lsn); values[1] = UInt16GetDatum(page->pd_checksum); values[2] = UInt16GetDatum(page->pd_flags); - values[3] = UInt16GetDatum(page->pd_lower); - values[4] = UInt16GetDatum(page->pd_upper); - values[5] = UInt16GetDatum(page->pd_special); - values[6] = UInt16GetDatum(PageGetPageSize(page)); + values[3] = Int32GetDatum(page->pd_lower); + values[4] = Int32GetDatum(page->pd_upper); + values[5] = Int32GetDatum(page->pd_special); + values[6] = Int32GetDatum(PageGetPageSize(page)); values[7] = UInt16GetDatum(PageGetPageLayoutVersion(page)); values[8] = TransactionIdGetDatum(page->pd_prune_xid);
Remove unused code from the KnownAssignedTransactionIdes submodule
Hi, In the KnownAssignedTransactionIdes sub-module, two lines of unused code were found in a previous change. -- Quan Zongliang CPUG diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index b448533564..99ce8c752a 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -3363,8 +3363,6 @@ RecordKnownAssignedTransactionIds(TransactionId xid) /* ShmemVariableCache->nextFullXid must be beyond any observed xid */ AdvanceNextFullTransactionIdPastXid(latestObservedXid); - next_expected_xid = latestObservedXid; - TransactionIdAdvance(next_expected_xid); } }
Re: bugfix: invalid bit/varbit input causes the log file to be unreadable
Good. I tested it, and it looks fine. Thank you. On 2020/6/29 1:10 上午, Tom Lane wrote: I wrote: Even granting the premise, the proposed patch seems like a significant decrease in user-friendliness for typical cases. I'd rather see us make an effort to print one valid-per-the-DB-encoding character. Now that we can rely on snprintf to count %s restrictions in bytes, I think something like this should work: errmsg("\"%.*s\" is not a valid binary digit", pg_mblen(sp), sp))); But the real problem is that this is only the tip of the iceberg. You didn't even hit all the %c usages in varbit.c. I went through all the %c format sequences in the backend to see which ones could use this type of fix. There were not as many as I'd expected, but still a fair number. (I skipped cases where the input was coming from the catalogs, as well as some non-user-facing debug printouts.) That leads to the attached patch, which seems to do the job without breaking anything that works today. regards, tom lane PS: I failed to resist the temptation to improve some shoddy error messages nearby in pageinspect/heapfuncs.c.
bugfix: invalid bit/varbit input causes the log file to be unreadable
The bit/varbit type input functions cause file_fdw to fail to read the logfile normally. 1. Server conf: server_encoding = UTF8 locale = zh_CN.UTF-8 2. Create external tables using file_fdw CREATE EXTENSION file_fdw; CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE pglog ( log_time timestamp(3) with time zone, user_name text, database_name text, process_id integer, connection_from text, session_id text, session_line_num bigint, command_tag text, session_start_time timestamp with time zone, virtual_transaction_id text, transaction_id bigint, error_severity text, sql_state_code text, message text, detail text, hint text, internal_query text, internal_query_pos integer, context text, query text, query_pos integer, location text, application_name text ) SERVER pglog OPTIONS ( filename 'log/postgresql-2020-06-16_213409.csv', format 'csv'); It's normal to be here. 3. bit/varbit input select b'Ù'; The foreign table cannot be accessed. SELECT * FROM pglog will get: invalid byte sequence for encoding "UTF8": 0xc3 0x22 The reason is that the error message in the bit_in / varbit_in function is output directly using %c. Causes the log file to not be decoded correctly. The attachment is a patch. diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c index f0c6a44b84..506cc35446 100644 --- a/src/backend/utils/adt/varbit.c +++ b/src/backend/utils/adt/varbit.c @@ -230,8 +230,8 @@ bit_in(PG_FUNCTION_ARGS) else if (*sp != '0') ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), -errmsg("\"%c\" is not a valid binary digit", - *sp))); +errmsg("\"0x%02X\" is not a valid binary digit", + (unsigned char)*sp))); x >>= 1; if (x == 0) @@ -531,8 +531,8 @@ varbit_in(PG_FUNCTION_ARGS) else if (*sp != '0') ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), -errmsg("\"%c\" is not a valid binary digit", - *sp))); +errmsg("\"0x%02X\" is not a valid binary digit", + (unsigned char)*sp))); x >>= 1; if (x == 0)
Re: Restore replication settings when modifying a field type
On 2020/1/15 08:30, Quan Zongliang wrote: On 2020/1/3 17:14, Peter Eisentraut wrote: On 2019-11-01 04:39, Euler Taveira wrote: ATExecAlterColumnType records everything that depends on the column and for indexes it saves the definition (via pg_get_indexdef_string). Definition is not sufficient for reconstructing the replica identity information because there is not such keyword for replica identity in CREATE INDEX. The new index should call relation_mark_replica_identity to fix pg_index.indisreplident. Yeah, I don't think we need to do the full dance of reverse compiling the SQL command and reexecuting it, as the patch currently does. That's only necessary for rebuilding the index itself. For re-setting the replica identity, we can just use the internal API as you say. Also, a few test cases would be nice for this patch. I'm a little busy. I'll write a new patch in a few days. new patch attached. test case 1: create table t1 (col1 int,col2 int not null, col3 int not null,unique(col2,col3)); alter table t1 replica identity using index t1_col2_col3_key; alter table t1 alter col3 type text; alter table t1 alter col3 type smallint using col3::int; test case 2: create table t2 (col1 varchar(10), col2 text not null, col3 timestamp not null,unique(col2,col3), col4 int not null unique); alter table t2 replica identity using index t2_col2_col3_key; alter table t2 alter col3 type text; alter table t2 replica identity using index t2_col4_key; alter table t2 alter col4 type timestamp using '2020-02-11'::timestamp; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b7c8d663fc..de4da64e06 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -176,6 +176,7 @@ typedef struct AlteredTableInfo List *changedConstraintDefs; /* string definitions of same */ List *changedIndexOids; /* OIDs of indexes to rebuild */ List *changedIndexDefs; /* string definitions of same */ + List *changedReplIdentName; /* fullname of index to reset REPLICA IDENTIFY */ } AlteredTableInfo; /* Struct describing one new constraint to check in Phase 3 scan */ @@ -482,6 +483,7 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode); static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab); static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab); +static void RememberReplicaIdentForRebuilding(Oid indoid, AlteredTableInfo *tab); static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode); static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, @@ -553,6 +555,7 @@ static void refuseDupeIndexAttach(Relation parentIdx, Relation partIdx, Relation partitionTbl); static List *GetParentedForeignKeyRefs(Relation partition); static void ATDetachCheckNoForeignKeyRefs(Relation partition); +static void relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid, bool is_internal); /* @@ -4274,7 +4277,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, Relationrel; ListCell *lcmd; - if (subcmds == NIL) + if (subcmds == NIL && pass != AT_PASS_MISC) + continue; + if (subcmds == NIL && pass == AT_PASS_MISC && + tab->changedReplIdentName==NIL) continue; /* @@ -4295,6 +4301,18 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, if (pass == AT_PASS_ALTER_TYPE) ATPostAlterTypeCleanup(wqueue, tab, lockmode); + if (pass == AT_PASS_MISC && tab->changedReplIdentName) + { + Relationidxrel; + ObjectAddress address = get_object_address(OBJECT_INDEX, + (Node*)tab->changedReplIdentName, , + AccessExclusiveLock, false); + relation_close(idxrel, NoLock); + + relation_mark_replica_identity(rel, REPLICA_IDENTITY_INDEX, + address.objectId, true); + } + relation_close(rel, NoLock);
Re: Restore replication settings when modifying a field type
On 2020/1/3 17:14, Peter Eisentraut wrote: On 2019-11-01 04:39, Euler Taveira wrote: ATExecAlterColumnType records everything that depends on the column and for indexes it saves the definition (via pg_get_indexdef_string). Definition is not sufficient for reconstructing the replica identity information because there is not such keyword for replica identity in CREATE INDEX. The new index should call relation_mark_replica_identity to fix pg_index.indisreplident. Yeah, I don't think we need to do the full dance of reverse compiling the SQL command and reexecuting it, as the patch currently does. That's only necessary for rebuilding the index itself. For re-setting the replica identity, we can just use the internal API as you say. Also, a few test cases would be nice for this patch. I'm a little busy. I'll write a new patch in a few days.
Re: Add a GUC variable that control logical replication
On 2019/11/1 20:49, Peter Eisentraut wrote: On 2019-10-20 00:23, Euler Taveira wrote: You can probably achieve that using ALTER PUBLICATION to disable publication of deletes or truncates, as the case may be, either permanently or just for the duration of the operations you want to skip. ... then you are skipping all tables in the publication. You can group tables into different publications and set the subscription to subscribe to multiple publications if you need this kind of granularity. In any case, this kind of thing needs to be handled by the decoding plugin based on its configuration policies and depending on its needs. For example, let's say you have two decoding plugins running: one for a replication system and one for writing an audit log. It would not be appropriate to disable logging for both of them because of some performance optimization for one of them. And it would also not be appropriate to do this with a USERSET setting. If we need different hooks or more DDL commands do this better, then that can be considered. But this seems to be the wrong way to do it. What the user needs is the same replication link that selectively skips some transactions. And this choice only affects transactions that are doing bulk delete sessions. The operations of other sessions are not affected and can continue to output replication messages. For example, session 1 wants to bulk delete 1 million old data from the T1 table, which can be done without replication. At the same time, session 2 deletes 10 records from T1, which is expected to be passed on through replication. Therefore, the two decoders can not meet this requirement. It is also inappropriate to temporarily disable subscriptions because it skips all transactions for a certain period of time. -- 权宗亮 神州飞象(北京)数据科技有限公司 我们的力量源自最先进的开源数据库PostgreSQL zongliang.q...@postgresdata.com
Re: Restore replication settings when modifying a field type
On 2019/10/28 12:39, Kyotaro Horiguchi wrote: Hello. # The patch no longer applies on the current master. Needs a rebasing. New patch, rebased on master branch. At Sat, 26 Oct 2019 16:50:48 +0800, Quan Zongliang wrote in In fact, the replication property of the table has not been modified, and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously specified index property 'indisreplident' is set to false because of the rebuild. I suppose that the behavior is intended. Change of column types on the publisher side can break the agreement on replica identity with subscribers. Thus replica identity setting cannot be restored unconditionally. For (somewhat artifitial :p) example: P=# create table t (c1 integer, c2 text unique not null); P=# alter table t replica identity using index t_c2_key; P=# create publication p1 for table t; P=# insert into t values (0, '00'), (1, '01'); S=# create table t (c1 integer, c2 text unique not null); S=# alter table t replica identity using index t_c2_key; S=# create subscription s1 connection '...' publication p1; Your patch allows change of the type of c2 into integer. P=# alter table t alter column c2 type integer using c2::integer; P=# update t set c1 = c1 + 1 where c2 = '01'; This change doesn't affect perhaps as expected. S=# select * from t; c1 | c2 + 0 | 00 1 | 01 (2 rows) So I developed a patch. If the user modifies the field type. The associated index is REPLICA IDENTITY. Rebuild and restore replication settings. Explicit setting of replica identity premises that they are sure that the setting works correctly. Implicit rebuilding after a type change can silently break it. At least we need to guarantee that the restored replica identity setting is truly compatible with all existing subscribers. I'm not sure about potential subscribers.. Anyway I think it is a problem that replica identity setting is dropped silently. Perhaps a message something like "REPLICA IDENTITY setting is lost, please redefine after confirmation of compatibility with subscribers." is needed. regards. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 05593f3316..02f8dbeabf 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -174,6 +174,8 @@ typedef struct AlteredTableInfo List *changedConstraintDefs; /* string definitions of same */ List *changedIndexOids; /* OIDs of indexes to rebuild */ List *changedIndexDefs; /* string definitions of same */ + OidchangedReplIdentOid; /* OID of index to reset REPLICA IDENTIFY */ + char *changedReplIdentDef;/* string definitions of same */ } AlteredTableInfo; /* Struct describing one new constraint to check in Phase 3 scan */ @@ -459,6 +461,7 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode); static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab); static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab); +static void RememberReplicaIdentForRebuilding(Oid indoid, AlteredTableInfo *tab); static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode); static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, @@ -10715,6 +10718,9 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, { Assert(foundObject.objectSubId == 0); RememberIndexForRebuilding(foundObject.objectId, tab); + + if (RelationGetForm(rel)->relreplident==REPLICA_IDENTITY_INDEX) + RememberReplicaIdentForRebuilding(foundObject.objectId, tab); } else if (relKind == RELKIND_SEQUENCE) { @@ -10749,9 +10755,18 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, } case OCLASS_CONSTRAINT: - Assert(foundObject.objectSubId == 0); - RememberConstraintForRebuilding(foundObject.objectId, tab); - break; + { + Oid indId; + + Assert(foundObject.objectSubId == 0); + RememberConstraintForRebuilding(foundObject.objectId, tab); + + indId = get_constraint_index(foundOb
Re: Restore replication settings when modifying a field type
On 2019/10/28 12:39, Kyotaro Horiguchi wrote: Hello. # The patch no longer applies on the current master. Needs a rebasing. At Sat, 26 Oct 2019 16:50:48 +0800, Quan Zongliang wrote in In fact, the replication property of the table has not been modified, and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously specified index property 'indisreplident' is set to false because of the rebuild. I suppose that the behavior is intended. Change of column types on the publisher side can break the agreement on replica identity with subscribers. Thus replica identity setting cannot be restored unconditionally. For (somewhat artifitial :p) example: P=# create table t (c1 integer, c2 text unique not null); P=# alter table t replica identity using index t_c2_key; P=# create publication p1 for table t; P=# insert into t values (0, '00'), (1, '01'); S=# create table t (c1 integer, c2 text unique not null); S=# alter table t replica identity using index t_c2_key; S=# create subscription s1 connection '...' publication p1; Your patch allows change of the type of c2 into integer. P=# alter table t alter column c2 type integer using c2::integer; P=# update t set c1 = c1 + 1 where c2 = '01'; This change doesn't affect perhaps as expected. S=# select * from t; c1 | c2 + 0 | 00 1 | 01 (2 rows) So I developed a patch. If the user modifies the field type. The associated index is REPLICA IDENTITY. Rebuild and restore replication settings. Explicit setting of replica identity premises that they are sure that the setting works correctly. Implicit rebuilding after a type change can silently break it. At least we need to guarantee that the restored replica identity setting is truly compatible with all existing subscribers. I'm not sure about potential subscribers.. Anyway I think it is a problem that replica identity setting is dropped silently. Perhaps a message something like "REPLICA IDENTITY setting is lost, please redefine after confirmation of compatibility with subscribers." is needed. In fact, the scene we encountered is like this. The field of a user's table is of type "smallint", and it turns out that this range is not sufficient. So change it to "int". At this point, the REPLICA IDENTITY is lost and the user does not realize it. When they found out, the logical replication for this period of time did not output normally. Users have to find other ways to get the data back. The logical replication of this user is to issue standard SQL statements to other relational databases using the plugin developed by himself. And they have thousands of tables to replicate. So I think this patch is appropriate in this scenario. As for the matching problem between publishers and subscribers, I'm afraid it's hard to solve here. If this is not a suitable modification, I can withdraw it. And see if there's a better way. If necessary, I'll modify it again. Rebase to the master branch. regards.
Restore replication settings when modifying a field type
When the user modifies the REPLICA IDENTIFY field type, the logical replication settings are lost. For example: postgres=# \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- col1 | integer | | | | plain | | col2 | integer | | not null | | plain | | Indexes: "t1_col2_key" UNIQUE CONSTRAINT, btree (col2) REPLICA IDENTITY postgres=# alter table t1 alter col2 type smallint; ALTER TABLE postgres=# \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +--+---+--+-+-+--+- col1 | integer | | | | plain | | col2 | smallint | | not null | | plain | | Indexes: "t1_col2_key" UNIQUE CONSTRAINT, btree (col2) In fact, the replication property of the table has not been modified, and it is still 'i'(REPLICA_IDENTITY_INDEX). But the previously specified index property 'indisreplident' is set to false because of the rebuild. So I developed a patch. If the user modifies the field type. The associated index is REPLICA IDENTITY. Rebuild and restore replication settings. Regards, Quan Zongliang diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b53f6ed3ac..c21372fe51 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -175,6 +175,8 @@ typedef struct AlteredTableInfo List *changedConstraintDefs; /* string definitions of same */ List *changedIndexOids; /* OIDs of indexes to rebuild */ List *changedIndexDefs; /* string definitions of same */ + OidchangedReplIdentOid; /* OID of index to reset REPLICA IDENTIFY */ + char *changedReplIdentDef;/* string definitions of same */ } AlteredTableInfo; /* Struct describing one new constraint to check in Phase 3 scan */ @@ -428,6 +430,7 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode); static void RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab); static void RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab); +static void RememberReplicaIdentForRebuilding(Oid indoid, AlteredTableInfo *tab); static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName, List *options, LOCKMODE lockmode); static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, @@ -9991,6 +9994,9 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, { Assert(foundObject.objectSubId == 0); RememberIndexForRebuilding(foundObject.objectId, tab); + + if (RelationGetForm(rel)->relreplident==REPLICA_IDENTITY_INDEX) + RememberReplicaIdentForRebuilding(foundObject.objectId, tab); } else if (relKind == RELKIND_SEQUENCE) { @@ -10010,8 +10016,14 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, } case OCLASS_CONSTRAINT: - Assert(foundObject.objectSubId == 0); - RememberConstraintForRebuilding(foundObject.objectId, tab); + { + Oid indId; + Assert(foundObject.objectSubId == 0); + RememberConstraintForRebuilding(foundObject.objectId, tab); + indId = get_constraint_index(foundObject.objectId); + if (OidIsValid(indId)) + RememberReplicaIdentForRebuilding(indId, tab); + } break; case OCLASS_REWRITE: @@ -10324,6 +10336,36 @@ RememberConstraintForRebuilding(Oid conoid, AlteredTableInfo *tab) } } +/* + * Subroutine for ATExecAlterColumnType: remember that a replica identify + * needs to be reset (which we might already know). + */ +static void +RememberReplicaIdentForRebuilding(Oid indoid, AlteredTable
Re: Add a GUC variable that control logical replication
On 2019/9/18 17:11, Peter Eisentraut wrote: On 2019-09-18 10:39, Quan Zongliang wrote: Sybase has a feature to turn off replication at the session level: set replication = off, which can be temporarily turned off when there is a maintenance action on the table. Our users also want this feature. These kinds of feature requests are always dubious because just because Sybase behaves this way for some implementation or architectural reason doesn't necessarily mean it makes sense for PostgreSQL too. Agree Why do you need to turn off replication when there is "maintenance" on a table? What does that even mean? In a table, the user only keep data for a period of time and delete expired records every day, involving about 10 million to 20 million records at a time. They want to not pass similar bulk operations in logical replication. My English is bad that I use the wrong word “maintenance” in my description.
Add a GUC variable that control logical replication
Sybase has a feature to turn off replication at the session level: set replication = off, which can be temporarily turned off when there is a maintenance action on the table. Our users also want this feature. I add a new flag bit in xinfo, control it with a session-level variable, when set to true, this flag is written when the transaction is committed, and when the logic is decoded it abandons the transaction like aborted transactions. Since PostgreSQL has two types of replication, I call the variable "logical_replication" to avoid confusion and default value is true. Sample SQL insert into a values(100); set logical_replication to off; insert into a values(200); reset logical_replication; insert into a values(300); pg_recvlogical output(the second is not output.) BEGIN 492 table public.a: INSERT: col1[integer]:100 COMMIT 492 BEGIN 494 table public.a: INSERT: col1[integer]:300 COMMIT 494 I'm not sure this is the most appropriate way. What do you think? Regards, Quan Zongliang diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 9162286c98..d7a04539d6 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -82,6 +82,8 @@ bool XactDeferrable; intsynchronous_commit = SYNCHRONOUS_COMMIT_ON; +bool logical_replication = true; + /* * When running as a parallel worker, we place only a single * TransactionStateData on the parallel worker's state stack, and the XID @@ -5535,6 +5537,9 @@ XactLogCommitRecord(TimestampTz commit_time, xl_origin.origin_timestamp = replorigin_session_origin_timestamp; } + if (!logical_replication) + xl_xinfo.xinfo |= XACT_XINFO_LOGIREPL_OFF; + if (xl_xinfo.xinfo != 0) info |= XLOG_XACT_HAS_INFO; diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index c53e7e2279..b11c89ff74 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -618,6 +618,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, */ if (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) || (parsed->dbId != InvalidOid && parsed->dbId != ctx->slot->data.database) || + (parsed->xinfo & XACT_XINFO_LOGIREPL_OFF) || ctx->fast_forward || FilterByOrigin(ctx, origin_id)) { for (i = 0; i < parsed->nsubxacts; i++) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 90ffd89339..0880a2765f 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1952,6 +1952,16 @@ static struct config_bool ConfigureNamesBool[] = NULL, NULL, NULL }, + { + {"logical_replication", PGC_USERSET, REPLICATION_MASTER, + gettext_noop("Close logical replication of transactions in the session."), + NULL + }, + _replication, + true, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL diff --git a/src/include/access/xact.h b/src/include/access/xact.h index d714551704..1a7f52ac50 100644 --- a/src/include/access/xact.h +++ b/src/include/access/xact.h @@ -81,6 +81,9 @@ typedef enum /* Synchronous commit level */ extern int synchronous_commit; +/* turn on/off logical replication */ +extern boollogical_replication; + /* * Miscellaneous flag bits to record events which occur on the top level * transaction. These flags are only persisted in MyXactFlags and are intended @@ -168,6 +171,12 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid, #define XACT_XINFO_HAS_AE_LOCKS(1U << 6) #define XACT_XINFO_HAS_GID (1U << 7) +/* + * It indicates that the data affected by this transaction does not need + * to be included in the logical replication. + */ +#define XACT_XINFO_LOGIREPL_OFF(1U << 28) + /* * Also stored in xinfo, these indicating a variety of additional actions that * need to occur when emulating transaction effects during recovery.
Re: enhance SPI to support EXECUTE commands
On 2019/9/5 17:33, Pavel Stehule wrote: čt 5. 9. 2019 v 10:57 odesílatel Quan Zongliang <mailto:zongliang.q...@postgresdata.com>> napsal: On 2019/9/5 16:31, Pavel Stehule wrote: > > > čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang > mailto:zongliang.q...@postgresdata.com> > <mailto:zongliang.q...@postgresdata.com <mailto:zongliang.q...@postgresdata.com>>> napsal: > > On 2019/9/5 15:09, Pavel Stehule wrote: > > > > > > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang > > mailto:zongliang.q...@postgresdata.com> > <mailto:zongliang.q...@postgresdata.com <mailto:zongliang.q...@postgresdata.com>> > > <mailto:zongliang.q...@postgresdata.com <mailto:zongliang.q...@postgresdata.com> > <mailto:zongliang.q...@postgresdata.com <mailto:zongliang.q...@postgresdata.com>>>> napsal: > > > > Dear hackers, > > > > I found that such a statement would get 0 in PL/pgSQL. > > > > PREPARE smt_del(int) AS DELETE FROM t1; > > EXECUTE 'EXECUTE smt_del(100)'; > > GET DIAGNOSTICS j = ROW_COUNT; > > > > In fact, this is a problem with SPI, it does not support > getting result > > of the EXECUTE command. I made a little enhancement. Support > for the > > number of rows processed when executing INSERT/UPDATE/DELETE > statements > > dynamically. > > > > > > Is there some use case for support this feature? > > > A user deletes the data in PL/pgSQL using the above method, hoping > to do > more processing according to the number of rows affected, and found > that > each time will get 0. > > Sample code: > PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1; > EXECUTE 'EXECUTE smt_del(100)'; > GET DIAGNOSTICS j = ROW_COUNT; > > > This has not sense in plpgsql. Why you use PREPARE statement explicitly? > Yes, I told him to do it in other ways, and the problem has been solved. Under psql, we can get this result flying=# EXECUTE smt_del(100); DELETE 1 So I think this may be the negligence of SPI, it should be better to deal with it. Personally, I would not to support features that allows bad code. My code is actually a way to continue the CREATE AS SELECT and COPY statements. In spi.c, they look like this: if (IsA(stmt->utilityStmt, CreateTableAsStmt)) // original code ... else if (IsA(stmt->utilityStmt, CopyStmt)) // original code ... else if (IsA(stmt->utilityStmt, ExecuteStmt)) // my code My patch was not developed for this PL/pgSQL approach. I just because it found this problem. Pavel > > IF j=1 THEN > do something > ELSIF j=0 THEN > do something > > Here j is always equal to 0. > > > > Regards > > > Regards > > > > Pavel > > > > > > Regards, > > Quan Zongliang > > >
Re: enhance SPI to support EXECUTE commands
On 2019/9/5 16:31, Pavel Stehule wrote: čt 5. 9. 2019 v 10:25 odesílatel Quan Zongliang <mailto:zongliang.q...@postgresdata.com>> napsal: On 2019/9/5 15:09, Pavel Stehule wrote: > > > čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang > mailto:zongliang.q...@postgresdata.com> > <mailto:zongliang.q...@postgresdata.com <mailto:zongliang.q...@postgresdata.com>>> napsal: > > Dear hackers, > > I found that such a statement would get 0 in PL/pgSQL. > > PREPARE smt_del(int) AS DELETE FROM t1; > EXECUTE 'EXECUTE smt_del(100)'; > GET DIAGNOSTICS j = ROW_COUNT; > > In fact, this is a problem with SPI, it does not support getting result > of the EXECUTE command. I made a little enhancement. Support for the > number of rows processed when executing INSERT/UPDATE/DELETE statements > dynamically. > > > Is there some use case for support this feature? > A user deletes the data in PL/pgSQL using the above method, hoping to do more processing according to the number of rows affected, and found that each time will get 0. Sample code: PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1; EXECUTE 'EXECUTE smt_del(100)'; GET DIAGNOSTICS j = ROW_COUNT; This has not sense in plpgsql. Why you use PREPARE statement explicitly? Yes, I told him to do it in other ways, and the problem has been solved. Under psql, we can get this result flying=# EXECUTE smt_del(100); DELETE 1 So I think this may be the negligence of SPI, it should be better to deal with it. IF j=1 THEN do something ELSIF j=0 THEN do something Here j is always equal to 0. Regards > Regards > > Pavel > > > Regards, > Quan Zongliang >
Re: enhance SPI to support EXECUTE commands
On 2019/9/5 15:09, Pavel Stehule wrote: čt 5. 9. 2019 v 8:39 odesílatel Quan Zongliang <mailto:zongliang.q...@postgresdata.com>> napsal: Dear hackers, I found that such a statement would get 0 in PL/pgSQL. PREPARE smt_del(int) AS DELETE FROM t1; EXECUTE 'EXECUTE smt_del(100)'; GET DIAGNOSTICS j = ROW_COUNT; In fact, this is a problem with SPI, it does not support getting result of the EXECUTE command. I made a little enhancement. Support for the number of rows processed when executing INSERT/UPDATE/DELETE statements dynamically. Is there some use case for support this feature? A user deletes the data in PL/pgSQL using the above method, hoping to do more processing according to the number of rows affected, and found that each time will get 0. Sample code: PREPARE smt_del(int) AS DELETE FROM t1 WHERE c=$1; EXECUTE 'EXECUTE smt_del(100)'; GET DIAGNOSTICS j = ROW_COUNT; IF j=1 THEN do something ELSIF j=0 THEN do something Here j is always equal to 0. Regards Regards Pavel Regards, Quan Zongliang
enhance SPI to support EXECUTE commands
Dear hackers, I found that such a statement would get 0 in PL/pgSQL. PREPARE smt_del(int) AS DELETE FROM t1; EXECUTE 'EXECUTE smt_del(100)'; GET DIAGNOSTICS j = ROW_COUNT; In fact, this is a problem with SPI, it does not support getting result of the EXECUTE command. I made a little enhancement. Support for the number of rows processed when executing INSERT/UPDATE/DELETE statements dynamically. Regards, Quan Zongliang diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 22dd55c378..b364492e0e 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -2272,6 +2272,27 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, _SPI_current->processed = pg_strtouint64(completionTag + 5, NULL, 10); } + else if (IsA(stmt->utilityStmt, ExecuteStmt)) + { + if (strncmp(completionTag, "INSERT ", 7) == 0) + { + char *p = completionTag + 7; + /* INSERT: skip oid and space */ + while (*p && *p != ' ') + p++; + if (*p != 0) + { + _SPI_current->processed = + pg_strtouint64(p, NULL, 10); + } + } + else if (strncmp(completionTag, "UPDATE ", 7) == 0 || + strncmp(completionTag, "DELETE ", 7) == 0) + { + _SPI_current->processed = + pg_strtouint64(completionTag + 7, NULL, 10); + } + } } /*