Re: CREATE INDEX CONCURRENTLY on partitioned index
Ilya Gladyshev писал(а) 2024-05-28 02:52: Also I'd like to note that in new patch version there's a strange wording in documentation: "This can be very convenient as not only will all existing partitions be indexed, but any future partitions will be as well. CREATE INDEX ... CONCURRENTLY can incur long lock times on huge partitioned tables, to avoid that you can use CREATE INDEX ON ONLY the partitioned table, which creates the new index marked as invalid, preventing automatic application to existing partitions." All the point of CIC is to avoid long lock times. So it seems this paragraph should be rewritten in the following way: "To avoid long lock times, you can use CREATE INDEX CONCURRENTLY or CREATE INDEX ON ONLY the partitioned table..." True, the current wording doesn't look right. Right now CREATE INDEX ON ONLY is described as a workaround for the missing CIC. I think it rather makes sense to say that it gives more fine-grained control of partition locking than both CIC and ordinary CREATE INDEX. See the updated patch. Hi. Not sure if it's worth removing mentioning of CIC in creates the new index marked as invalid, preventing automatic application to existing partitions. Instead, indexes can then be created individually - on each partition using CONCURRENTLY and + on each partition and attached to the partitioned index on the parent using ALTER INDEX ... ATTACH PARTITION. Once indexes for all the partitions are attached to the parent index, the parent index will but at least now it looks better. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30: Hi Mr. Pyhalov. Sorry for the late reply. Thank you for your modification and detailed review. I attach a fixed patch, have been not yet rebased. Monday, 25 March 2024 16:01 Alexander Pyhalov :. Comment in nodeAgg.c seems to be strange: 1079 /* 1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE keyword is 1081 * not specified, apply the agg's finalfn. 1082 * If PARTIAL_AGGREGATE keyword is specified and the transValue type 1083 * is internal, apply the agg's serialfn. In this case, if the agg's 1084 * serialfn must not be invalid. Otherwise return transValue. 1085 */ Likely, you mean: ... In this case the agg'ss serialfn must not be invalid... Fixed. Lower, in the same file, please, correct error message: 1136 if(!OidIsValid(peragg->serialfn_oid)) 1137 elog(ERROR, "serialfunc is note provided for partial aggregate"); it should be "serialfunc is not provided for partial aggregate" Fixed. Also something is wrong with the following test : SELECT /* aggregate <> partial aggregate */ array_agg(c_int4array), array_agg(b), avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval), avg(b::float4), avg(b::float8), corr(b::float8, (b * b)::float8), covar_pop(b::float8, (b * b)::float8), covar_samp(b::float8, (b * b)::float8), regr_avgx((2 * b)::float8, b::float8), . Its results have changed since last patch. Do they depend on daylight saving time? You are right. In my environment, TimeZone is set to 'PST8PDT' with which timetz values depends on daylight saving time. Changed TimeZone to 'UTC' in this test. You can see that filter is applied before append. The result is correct only by chance, as sum in every partition is actually < 700. If you lower this bound, let's say, to 200, you'll start getting wrong results as data is filtered prior to aggregation. It seems, however, that in partial case you should just avoid pulling conditions from having qual at all, all filters will be applied on upper level. Something like Thank you for your modification. Found one more problem. You can fire partial aggregate over partitioned table, but convert_combining_aggrefs() will make non-partial copy, which leads to 'variable not found in subplan target list' error. Thanks for the correction as well. As you pointed out, the original patch certainly had the potential to cause problems. However, I could not actually reproduce the problem in cases such as the following. Settings: t(c1, c2) is a patitioned table whose partition key is c1. t1, t2 are patitions of t and are partitioned table. t11, t12: partitions of t1 and foreign table of postgres_fdw. t21, t22: partitions of t2 and foreign table of postgres_fdw. Query: select c2 / 2, sum(c1) from t group by c2 / 2 order by 1 If you have a reproducible example, I would like to add it to the regression test. Do you have a reproducible example? Also denied partial agregates pushdown on server version mismatch. Should check_partial_aggregate_support be 'true' by default? Could we discuss this point after we determine how to transfer state values? If we determine this point, we can easly determine whether check_partial_aggregate_support shold be 'true' by default. I'm not sure what to do with current grammar - it precludes partial distinct aggregates. I understand that it's currently impossible to have partial aggregation for distinct agregates -but does it worth to have such restriction at grammar level? If partial aggregation for distinct agregates becomes possible in the future, I see no problem with the policy of accepting new SQL keywords, such as "PARTIL_AGGREGATE DISTINCT". BTW, there's I have an issue with test results in the last version of the patch. Attaching regression diffs. I have partial sum over c_interval instead of sum(c_interval). -- Best regards, Alexander Pyhalov, Postgres Professionaldiff -U3 /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out --- /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out 2024-05-28 08:48:27.236520098 +0300 +++ /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out 2024-05-28 08:51:24.395704846 +0300 @@ -10438,15 +10438,15 @@ -> Foreign Scan Output: (PARTIAL array_agg(pagg_tab.c_int4array)), (PARTIAL array_agg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::smallint)), (PARTIAL avg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::bigint)), (PARTIAL avg(pagg_tab.c_interval)), (PARTIAL avg((pagg_tab.b)::real)), (PARTIAL avg((pagg_tab.b)::double precision)), (PARTIAL avg((pagg_tab.b)::numeric)), (PARTIAL corr((pagg_tab.b)::double precision, ((pagg_tab.b * pagg_tab.b))::double precision)), (PARTIAL covar_pop((pagg_tab.b)::double p
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30: Hi Mr. Pyhalov. Hi. Found one more problem. You can fire partial aggregate over partitioned table, but convert_combining_aggrefs() will make non-partial copy, which leads to 'variable not found in subplan target list' error. Thanks for the correction as well. As you pointed out, the original patch certainly had the potential to cause problems. However, I could not actually reproduce the problem in cases such as the following. Settings: t(c1, c2) is a patitioned table whose partition key is c1. t1, t2 are patitions of t and are partitioned table. t11, t12: partitions of t1 and foreign table of postgres_fdw. t21, t22: partitions of t2 and foreign table of postgres_fdw. Query: select c2 / 2, sum(c1) from t group by c2 / 2 order by 1 If you have a reproducible example, I would like to add it to the regression test. Do you have a reproducible example? The fix was to set child_agg->agg_partial to orig_agg->agg_partial in convert_combining_aggrefs(), it's already in the patch, as well as the example - without this fix -- Check partial aggregate over partitioned table EXPLAIN (VERBOSE, COSTS OFF) SELECT avg(PARTIAL_AGGREGATE a), avg(a) FROM pagg_tab; fails with ERROR: variable not found in subplan target list -- Best regards, Alexander Pyhalov, Postgres Professional
Re: CREATE INDEX CONCURRENTLY on partitioned index
Ilya Gladyshev писал(а) 2024-05-24 00:14: Hi, Hi. I think it's well worth the effort to revive the patch, so I rebased it on master, updated it and will return it back to the commitfest. Alexander, Justin feel free to add yourselves as authors On 29.01.2024 12:43, Alexander Pyhalov wrote: Hi. I've rebased patch on master and it'seems to me there's one more issue - when we call DefineIndexConcurrentInternal() in partitioned case, it waits for transactions, locking tableId, not tabrelid - heaprelid LockRelId is constructed for parent index relation, not for child index relation. Attaching fixed version. Also I'm not sure what to do with locking of child relations. If we don't do anything, you can drop one of the partitioned table childs while CIC is in progress, and get error ERROR: cache lookup failed for index 16399 I agree that we need to do something about it, in particular, I think we should lock all the partitions inside the transaction that builds the catalog entries. Fixed this in the new version. If you try to lock all child tables in CIC session, you'll get deadlocks. Do you mean the deadlock between the transaction that drops a partition and the transaction doing CIC? I think this is unavoidable and can be reproduced even without partitioning. Yes, it seems we trade this error for possible deadlock between transaction, dropping a partition, and CIC. Also not sure why a list of children relation was obtained with ShareLock that CIC is supposed to avoid not to block writes, changed that to ShareUpdateExclusive. I expect that it wasn't an issue due to the fact that it's held for a brief period until DefineIndexConcurrentInternal() commits for the first time. But it seems, it's more correct to use ShareUpdateExclusive lock here. Also I'd like to note that in new patch version there's a strange wording in documentation: "This can be very convenient as not only will all existing partitions be indexed, but any future partitions will be as well. CREATE INDEX ... CONCURRENTLY can incur long lock times on huge partitioned tables, to avoid that you can use CREATE INDEX ON ONLY the partitioned table, which creates the new index marked as invalid, preventing automatic application to existing partitions." All the point of CIC is to avoid long lock times. So it seems this paragraph should be rewritten in the following way: "To avoid long lock times, you can use CREATE INDEX CONCURRENTLY or CREATE INDEX ON ONLY the partitioned table..." -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
grouped_rel->relids, NULL, NULL); - if (is_foreign_expr(root, grouped_rel, expr) && !partial) + if (is_foreign_expr(root, grouped_rel, expr)) fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo); else fpinfo->local_conds = lappend(fpinfo->local_conds, rinfo); From: Alexander Pyhalov Sent: Wednesday, February 28, 2024 10:43 PM contrib/postgres_fdw/deparse.c: comment before appendFunctionName() has gone, this seems to be wrong. Fixed. From: Alexander Pyhalov Sent: Wednesday, February 28, 2024 10:43 PM In finalize_aggregate() 1079 /* 1080 * Apply the agg's finalfn if one is provided, else return transValue. 1081 */ Comment should be updated to note behavior for agg_partial aggregates. Fixed. Comment in nodeAgg.c seems to be strange: 1079 /* 1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE keyword is 1081 * not specified, apply the agg's finalfn. 1082 * If PARTIAL_AGGREGATE keyword is specified and the transValue type 1083 * is internal, apply the agg's serialfn. In this case, if the agg's 1084 * serialfn must not be invalid. Otherwise return transValue. 1085 */ Likely, you mean: ... In this case the agg'ss serialfn must not be invalid... Lower, in the same file, please, correct error message: 1136 if(!OidIsValid(peragg->serialfn_oid)) 1137 elog(ERROR, "serialfunc is note provided for partial aggregate"); it should be "serialfunc is not provided for partial aggregate" Also something is wrong with the following test : SELECT /* aggregate <> partial aggregate */ array_agg(c_int4array), array_agg(b), avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval), avg(b::float4), avg(b::float8), corr(b::float8, (b * b)::float8), covar_pop(b::float8, (b * b)::float8), covar_samp(b::float8, (b * b)::float8), regr_avgx((2 * b)::float8, b::float8), . Its results have changed since last patch. Do they depend on daylight saving time? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack
Etsuro Fujita писал(а) 2024-03-21 13:59: On Sun, Feb 25, 2024 at 6:34 PM Etsuro Fujita wrote: > On Fri, Feb 23, 2024 at 01:21:14PM +0300, Alexander Pyhalov wrote: > > Recent commit 555276f8594087ba15e0d58e38cd2186b9f39f6d introduced final > > cleanup of node->as_eventset in ExecAppendAsyncEventWait(). > > Unfortunately, now this function can return in the middle of TRY/FINALLY > > block, without restoring PG_exception_stack. > > > > We found this while working on our FDW. Unfortunately, I couldn't reproduce > > the issue with postgres_fdw, but it seems it is also affected. I think this would happen when FDWs configure no events; IIRC I think while the core allows them to do so, postgres_fdw does not do so, so this would never happen with it. I was wrong; as you pointed out, this would affect postgres_fdw as well. See commit 1ec7fca85, which is my commit, but I forgot it completely. :-( As I said before, the patch looks good to me. I tweaked comments in ExecAppendAsyncEventWait() a bit. Attached is an updated patch. In the patch I also fixed a confusing comment in a related function in postgres_fdw.c about handling of the in-process request that might be useless to process. Sorry, it took more time than expected to get back to this thread. Hi. The updated patch still looks good to me. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Hi. fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20: Hi. Mr.Haas, hackers. I apologize for the significant delay since my last post. I have conducted investigations and considerations regarding the remaining tasks as follows. Would it be possible for you to review them? In particular, could you please confirm if the approach mentioned in 1. is acceptable? If there are no issues with the direction outlined in 1., I plan to make a simple prototype based on this approach. 1. Transmitting state value safely between machines From: Robert Haas Sent: Wednesday, December 6, 2023 10:25 PM the problems around transmitting serialized bytea blobs between machines that can't be assumed to fully trust each other will need to be addressed in some way, which seems like it will require a good deal of design work, forming some kind of consensus, and then implementation work to follow. I have considered methods for safely transmitting state values between different machines. I have taken into account the version policy of PostgreSQL (5 years of support) and the major version release cycle over the past 10 years (1 year), and as a result, I have made the assumption that transmission is allowed only when the difference between the local version and the remote version is 5 or less. I believe that by adding new components, "export function" and "import function", to the aggregate functions, and further introducing a new SQL keyword to the query syntax of aggregate expressions, we can address this issue. If the version of the local server is higher than or equal to the version of the remote server, the proposed method can be simplified. The export version mentioned later in (1) would not be necessary. Furthermore, if the version of the local server matches the version of the remote server, the proposed method can be further simplified. I would appreciate your input on reasonable assumptions regarding the differences in versions between the local server and the remote server. I will explain the specifications of the export function, import function, the new SQL keyword for aggregate expressions, and the behavior of query processing for partial aggregation separately. (1) Export Function Specification This function is another final function for partial aggregate. This function converts the state value that represents the result of partial aggregation into a format that can be read by the local server. This function is called instead of the existing finalfunc during the final stage of aggregation when performing partial aggregation. The conversion process described above will be referred to as "export". The argument of an export function is the version of the server that will receive the return value. Hereafter, this version will be referred to as the export version. The concept of an export version is necessary to handle cases where the version of the local server is smaller than the version of the remote server. The return value of the export function is the transformed state value, and its data type is bytea. For backward compatibility, the developer of the export function must ensure that the export can be performed for major versions up to five versions prior to the major version of PostgreSQL that the export function is being developed for. For built-in functions, I believe it is necessary to allow for the possibility of not developing the export functionality for specific versions in the future (due to reasons such as development burden) after the export function is developed for a certain version. To achieve this, for built-in functions, we will add a column to the pg_aggregate catalog that indicates the presence or absence of export functionality for each major version, including the major version being developed and the previous five major versions. This column will be named safety_export_versions and will have a data type of boolean[6]. For user-defined functions, we will refer to the extensions option and add an external server option called safety_export_extensions, which will maintain a list of extensions that include only the aggregate functions that can be exported to the local server version. ... I honestly think that achieving cross-version compatibility in this way puts a significant burden on developers. Can we instead always use the more or less universal export and import function to fix possible issues with binary representations on different architectures and just refuse to push down partial aggregates on server version mismatch? At least at the first step? 3. Fixing the behavior when the HAVING clause is present From: Robert Haas Sent: Tuesday, November 28, 2023 4:08 AM On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov wrote: > Hi. HAVING is also a problem. Consider the following query > > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to > foreign
ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack
Hi. Recent commit 555276f8594087ba15e0d58e38cd2186b9f39f6d introduced final cleanup of node->as_eventset in ExecAppendAsyncEventWait(). Unfortunately, now this function can return in the middle of TRY/FINALLY block, without restoring PG_exception_stack. We found this while working on our FDW. Unfortunately, I couldn't reproduce the issue with postgres_fdw, but it seems it is also affected. The following patch heals the issue. -- l Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 025f40894d6d8f499144f0f7c45c0a124a46c408 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Fri, 23 Feb 2024 13:06:40 +0300 Subject: [PATCH] Avoid corrupting PG_exception_stack in ExecAppendAsyncEventWait() After commit 555276f8594087ba15e0d58e38cd2186b9f39f6d ExecAppendAsyncEventWait() could corrupt PG_exception_stack after returning in the the middle of PG_TRY()/PG_END_TRY() block. It should exit only after PG_END_TRY(). --- src/backend/executor/nodeAppend.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index baba3ceea23..42962e80177 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -1052,21 +1052,22 @@ ExecAppendAsyncEventWait(AppendState *node) */ if (GetNumRegisteredWaitEvents(node->as_eventset) == 1) { - FreeWaitEventSet(node->as_eventset); - node->as_eventset = NULL; - return; + /* Return after PG_TRY()/PG_END_TRY() block */ + noccurred = 0; } + else + { + /* Return at most EVENT_BUFFER_SIZE events in one call. */ + if (nevents > EVENT_BUFFER_SIZE) +nevents = EVENT_BUFFER_SIZE; - /* Return at most EVENT_BUFFER_SIZE events in one call. */ - if (nevents > EVENT_BUFFER_SIZE) - nevents = EVENT_BUFFER_SIZE; - - /* - * If the timeout is -1, wait until at least one event occurs. If the - * timeout is 0, poll for events, but do not wait at all. - */ - noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event, - nevents, WAIT_EVENT_APPEND_READY); + /* + * If the timeout is -1, wait until at least one event occurs. If + * the timeout is 0, poll for events, but do not wait at all. + */ + noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event, + nevents, WAIT_EVENT_APPEND_READY); + } } PG_FINALLY(); { -- 2.34.1
Re: CREATE INDEX CONCURRENTLY on partitioned index
Hi. I've rebased patch on master and it'seems to me there's one more issue - when we call DefineIndexConcurrentInternal() in partitioned case, it waits for transactions, locking tableId, not tabrelid - heaprelid LockRelId is constructed for parent index relation, not for child index relation. Attaching fixed version. Also I'm not sure what to do with locking of child relations. If we don't do anything, you can drop one of the partitioned table childs while CIC is in progress, and get error ERROR: cache lookup failed for index 16399 If you try to lock all child tables in CIC session, you'll get deadlocks. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 37a13b7fa1c3277b9d038b7a0c75399ff05b28a7 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 29 Jan 2024 10:41:01 +0300 Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table --- doc/src/sgml/ddl.sgml | 4 +- doc/src/sgml/ref/create_index.sgml | 14 +- src/backend/commands/indexcmds.c | 200 ++--- src/test/regress/expected/indexing.out | 127 +++- src/test/regress/sql/indexing.sql | 26 +++- 5 files changed, 296 insertions(+), 75 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 075ff329912..8ee80c40e3b 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -4194,9 +4194,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 so that they are applied automatically to the entire hierarchy. This is very convenient, as not only will the existing partitions become indexed, but - also any partitions that are created in the future will. One limitation is - that it's not possible to use the CONCURRENTLY - qualifier when creating such a partitioned index. To avoid long lock + also any partitions that are created in the future will. To avoid long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 40986aa502f..b05102efdaf 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -645,7 +645,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX -command will fail but leave behind an invalid index. This index +command will fail but leave behind an invalid index. +If this happens while build an index concurrently on a partitioned +table, the command can also leave behind valid or +invalid indexes on table partitions. The invalid index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead. The psql \d command will report such an index as INVALID: @@ -692,15 +695,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ab8b81b3020..65477aeb3a8 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -93,6 +93,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId, bool primary, bool isconstraint); static char *ChooseIndexNameAddition(const List *colnames); static List *ChooseIndexColumnNames(const List *indexElems); +static void DefineIndexConcurrentInternal(Oid relationId, + Oid indexRelationId, + IndexInfo *indexInfo, + LOCKTAG heaplocktag, + LockRelId heaprelid); static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params, bool isTopLevel); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, @@ -554,7 +559,6 @@ DefineIndex(Oid tableId, bool amissummarizing; amoptions_function amoptions; bool partitioned; - bool safe_index; Datum reloptions; int16 *coloptions; IndexInfo *indexInfo; @@ -562,12 +566,10 @@ DefineIndex(Oid tableId, bits16 constr_flags; int numberOfAttributes; int numberOfKeyAttributes; - TransactionId limitXmin; ObjectAddress address; LockRelId heaprelid; LOCKTAG heaplocktag; LOCKMODE lockmode; - Snapshot snapshot; Oid root_save_userid; int root_save_sec_context; int root_save_nestlevel; @@ -697,20 +699,6 @@ DefineIndex(Oid tableId
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Andrei Lepikhov писал(а) 2023-12-08 07:37: On 28/11/2023 01:37, Alexander Korotkov wrote: On Mon, Nov 27, 2023 at 8:07 PM Andres Freund wrote: Sorry for the late answer, I missed this thread because of vacation. On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote: How do we ensure that we are not making unnecessary copies of Bitmapsets? We don't - but that's not specific to this patch. Bitmapsets typically aren't very large, I doubt that it's a significant proportion of the memory usage. Adding refcounts or such would likely add more overhead than it'd save, both in time and memory. I'd already clashed with Tom on copying the required_relids field and voluntarily made unnecessary copies in the project [1]. And ... stuck into huge memory consumption. The reason was in Bitmapsets: When we have 1E3-1E4 partitions and try to reparameterize a join, one bitmapset field can have a size of about 1kB. Having bitmapset referencing Relation with a large index value, we had a lot of (for example, 1E4 * 1kB) copies on each reparametrization of such a field. Alexander Pyhalov should remember that case. Yes. If it matters, this happened during reparametrization when 2 partitioned tables with 1000 partitions each were joined. Then asymmetric pw join managed to eat lots of memory for bitmapsets (by lots of memory I mean all available on the test VM). [1] Asymmetric partition-wise JOIN https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Alexander Korotkov писал(а) 2023-12-03 23:52: Hi, Alexander! On Mon, Nov 27, 2023 at 5:11 PM Alexander Pyhalov wrote: Alexander Korotkov писал(а) 2023-11-27 03:49: > Thank you for the revision. > > I've revised the patch myself. I've replaced StringInfo with > additional conds into a list of strings as I proposed before. I think > the code became much clearer. Also, it gets rid of some unnecessary > allocations. > > I think the code itself is not in bad shape. But patch lacks some > high-level description of semi-joins processing as well as comments on > each manipulation with additional conds. Could you please add this? > Hi. The updated patch looks better. It seems I've failed to fix logic in deparseFromExprForRel() when tried to convert StringInfos to Lists. I've added some comments. The most complete description of how SEMI-JOIN is processed, is located in deparseFromExprForRel(). Unfortunately, there seems to be no single place, describing current JOIN deparsing logic. Looks good to me. I've made some grammar and formatting adjustments. Also, I've written the commit message. Now, I think this looks good. I'm going to push this if no objections. -- Regards, Alexander Korotkov Hi. No objections from my side. Perhaps, some rephrasing is needed in comment in semijoin_target_ok(): "The planner can create semi-joins, which refer to inner rel vars in its target list." Perhaps, change "semi-joins, which refer" to "a semi-join, which refers ...", as later we speak about "its" target list. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Alexander Korotkov писал(а) 2023-11-27 03:49: Thank you for the revision. I've revised the patch myself. I've replaced StringInfo with additional conds into a list of strings as I proposed before. I think the code became much clearer. Also, it gets rid of some unnecessary allocations. I think the code itself is not in bad shape. But patch lacks some high-level description of semi-joins processing as well as comments on each manipulation with additional conds. Could you please add this? Hi. The updated patch looks better. It seems I've failed to fix logic in deparseFromExprForRel() when tried to convert StringInfos to Lists. I've added some comments. The most complete description of how SEMI-JOIN is processed, is located in deparseFromExprForRel(). Unfortunately, there seems to be no single place, describing current JOIN deparsing logic. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom c17e05d09d5721d22785ed11bed053162d67d967 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 27 Nov 2023 14:35:29 +0300 Subject: [PATCH] postgres_fdw: add support for deparsing semi joins --- contrib/postgres_fdw/deparse.c| 234 ++--- .../postgres_fdw/expected/postgres_fdw.out| 320 -- contrib/postgres_fdw/postgres_fdw.c | 94 - contrib/postgres_fdw/postgres_fdw.h | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 126 ++- 5 files changed, 696 insertions(+), 81 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 09fd489a901..8670524578b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -180,11 +180,15 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, Index ignore_rel, List **ignore_conds, + List **additional_conds, List **params_list); +static void appendWhereClause(List *exprs, List *additional_conds, + deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool make_subquery, - Index ignore_rel, List **ignore_conds, List **params_list); + Index ignore_rel, List **ignore_conds, + List **additional_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, @@ -1370,6 +1374,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; RelOptInfo *scanrel = context->scanrel; + List *additional_conds = NIL; /* For upper relations, scanrel must be either a joinrel or a baserel */ Assert(!IS_UPPER_REL(context->foreignrel) || @@ -1379,14 +1384,11 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) appendStringInfoString(buf, " FROM "); deparseFromExprForRel(buf, context->root, scanrel, (bms_membership(scanrel->relids) == BMS_MULTIPLE), - (Index) 0, NULL, context->params_list); - - /* Construct WHERE clause */ - if (quals != NIL) - { - appendStringInfoString(buf, " WHERE "); - appendConditions(quals, context); - } + (Index) 0, NULL, _conds, + context->params_list); + appendWhereClause(quals, additional_conds, context); + if (additional_conds != NIL) + list_free_deep(additional_conds); } /* @@ -1598,6 +1600,42 @@ appendConditions(List *exprs, deparse_expr_cxt *context) reset_transmission_modes(nestlevel); } +/* + * Append WHERE clause, containing conditions + * from exprs and additional_conds, to context->buf. + */ +static void +appendWhereClause(List *exprs, List *additional_conds, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool need_and = false; + ListCell *lc; + + if (exprs != NIL || additional_conds != NIL) + appendStringInfoString(buf, " WHERE "); + + /* + * If there are some filters, append them. + */ + if (exprs != NIL) + { + appendConditions(exprs, context); + need_and = true; + } + + /* + * If there are some EXISTS conditions, coming from SEMI-JOINS, append + * them. + */ + foreach(lc, additional_conds) + { + if (need_and) + appendStringInfoString(buf, " AND "); + appendStringInfoString(buf, (char *) lfirst(lc)); + need_and = true; + } +} + /* Output join name for given join type */ const char * get_jointype_name(JoinType jointype) @@ -1616,6 +1654,9 @@ get_jointype_name(JoinType jointype) case JOIN_FULL: return "FULL"; + case JOIN_SEMI: + return "SEMI"; + default: /* Shouldn't come here, but protect from buggy code. */ elog(ERROR, &
Re: Partial aggregates pushdown
Robert Haas писал 2023-11-21 20:16: > I don't think the patch does a good job explaining why HAVING, > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING > shouldn't really be a problem, because HAVING is basically a WHERE > clause that occurs after aggregation is complete, and whether or not > the aggregation is safe shouldn't depend on what we're going to do > with the value afterward. The HAVING clause can't necessarily be > pushed to the remote side, but I don't see how or why it could make > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a > little trickier: if we pushed down DISTINCT, we'd still have to > re-DISTINCT-ify when combining locally, and if we pushed down ORDER > BY, we'd have to do a merge pass to combine the returned values unless > we could prove that the partitions were non-overlapping ranges that > would be visited in the correct order. Although that all sounds > doable, I think it's probably a good thing that the current patch > doesn't try to handle it -- this is complicated already. But it should > explain why it's not handling it and maybe even a bit about how it > could be handling in the future, rather than just saying "well, this > kind of thing is not safe." The trouble with that explanation is that > it does nothing to help the reader understand whether the thing in > question is *fundamentally* unsafe or whether we just don't have the > right code to make it work. Makes sense. Actually, I think I was wrong about this. We can't handle ORDER BY or DISTINCT because we can't distinct-ify or order after we've already partially aggregated. At least not in general, and not without additional aggregate support functions. So what I said above was wrong with respect to those. Or so I believe, anyway. But I still don't see why HAVING should be a problem. Hi. HAVING is also a problem. Consider the following query SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to foreign server as HAVING needs full aggregate result, but foreign server don't know it. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Alexander Korotkov писал 2023-10-30 19:05: Hi, Alexander! Thank you for working on this. I believe this is a very interesting patch, which significantly improves our FDW-based distributed facilities. This is why I decided to review this. Hi. Thanks for reviewing. + /* +* We can't push down join if its reltarget is not safe +*/ + if (!joinrel_target_ok(root, joinrel, jointype, outerrel, innerrel)) return false; As I get joinrel_target_ok() function do meaningful checks only for semi join and always return false for all other kinds of joins. I think we should call this only for semi join and name the function accordingly. Done. + fpinfo->unknown_subquery_rels = bms_union(fpinfo_o->unknown_subquery_rels, + fpinfo_i->unknown_subquery_rels); Should the comment before this code block be revised? Updated comment. + case JOIN_SEMI: + fpinfo->joinclauses = list_concat(fpinfo->joinclauses, + fpinfo_i->remote_conds); + fpinfo->joinclauses = list_concat(fpinfo->joinclauses, + fpinfo->remote_conds); + fpinfo->remote_conds = list_copy(fpinfo_o->remote_conds); + fpinfo->unknown_subquery_rels = bms_union(fpinfo->unknown_subquery_rels, + innerrel->relids); + break; I think that comment before switch() should be definitely revised. + Relids hidden_subquery_rels; /* relids, which can't be referred to + * from upper relations */ Could this definition contain the positive part? Can't be referred to from upper relations, but used internally for semi joins (or something like that)? Made comment a bit more verbose. Also, I think the machinery around the append_conds could be somewhat simpler if we turn them into a list (list of strings). I think that should make code clearer and also save us some memory allocations. I've tried to rewrite it as managing lists.. to find out that these are not lists. I mean, in deparseFromExprForRel() we replace lists from both side with one condition. This allows us to preserve conditions hierarchy. We should merge these conditions in the end of IS_JOIN_REL(foreignrel) branch, or we'll push them too high. And if we deparse them in this place as StringInfo, I see no benefit to convert them to lists. In [1] you've referenced the cases, when your patch can't push down semi-joins. It doesn't seem impossible to handle these cases, but that would make the patch much more complicated. I'm OK to continue with a simpler patch to handle the majority of cases. Could you please add the cases, which can't be pushed down with the current patch, to the test suite? There are several cases when we can't push down semi-join in current patch. 1) When target list has attributes from inner relation, which are equivalent to some attributes of outer relation, we fail to notice this. 2) When we examine A join B and decide that we can't push it down, this decision is final - we state it in fdw_private of joinrel, and so if we consider joining these relations in another order, we don't reconsider. This means that if later examine B join A, we don't try to push it down. As semi-join can be executed as JOIN_UNIQUE_INNER or JOIN_UNIQUE_OUTER, this can be a problem - we look at some of these paths and remember that we can't push down such join. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 91ae85ac735c9f109cd3ab3603693177011e5b94 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 206 --- .../postgres_fdw/expected/postgres_fdw.out| 320 -- contrib/postgres_fdw/postgres_fdw.c | 94 - contrib/postgres_fdw/postgres_fdw.h | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 126 ++- 5 files changed, 668 insertions(+), 81 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 09fd489a901..cb0e373055d 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -180,11 +180,14 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, Index ignore_rel, List **ignore_conds, + StringInfo additional_conds, List **params_list); +static void appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); sta
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-28 07:40: I'm not sure that I like this mechanics of adding sort group clauses - it seems we do in core additional work, which is of use only for one extension, but at least it seems to be working. We cannot deparse the original sort group clauses and pathtarget when performing partial aggreggate pushdown by any FDWs. So I think the additional sort group clauses and pathtarget are needed by any FDWs, not only postgres_fdw. Hi. It seems to me that *fdw postfixes don't clarify things, but just make naming more ugly. + * Adding these Vars and PlaceHolderVars to PathTarget, + * FDW cannot deparse this by the original List of SortGroupClauses. + * So, before this adding process, + * setGroupClausePartial generates another Pathtarget and another + * List of SortGroupClauses for FDW. It seems that something like: /* * Modified PathTarget cannot be used by FDW as-is to deparse this statement. * So, before modifying PathTarget, setGroupClausePartial generates * another Pathtarget and another list List of SortGroupClauses * to make deparsing possible. */ sounds better. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-27 01:35: Hi Mr.Momjian, Mr.Pyhalov. Tuesday, 26 September 2023 22:15 Alexander Pyhalov : Do you mean that extra->partial_target->sortgrouprefs is not replaced, and so we preserve tlesortgroupref numbers? Yes, that is correct. I'm suspicious about rewriting extra->partial_target->exprs with partial_target->exprs - I'm still not sure why we don't we loose information, added by add_column_to_pathtarget() to extra->partial_target->exprs? Hi. In postgres_fdw.sql "Partial aggregates are unsafe to push down having clause when there are partial aggregates" - this comment likely should be fixed. Some comments should be added to setGroupClausePartial() and to make_partial_grouping_target() - especially why setGroupClausePartial() is called prior to add_new_columns_to_pathtarget(). I'm not sure that I like this mechanics of adding sort group clauses - it seems we do in core additional work, which is of use only for one extension, but at least it seems to be working. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-25 06:18: Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers. Thank you for your valuable comments. I sincerely apologize for the very late reply. Here is a response to your comments or a fix to the patch. Tuesday, August 8, 2023 at 3:31 Bruce Momjian > I have modified the program except for the point "if the version of the remote server is less than PG17". > Instead, we have addressed the following. > "If check_partial_aggregate_support is true and the remote server version is older than the local server > version, postgres_fdw does not assume that the partial aggregate function is on the remote server unless > the partial aggregate function and the aggregate function match." > The reason for this is to maintain compatibility with any aggregate function that does not support partial > aggregate in one version of V1 (V1 is PG17 or higher), even if the next version supports partial aggregate. > For example, string_agg does not support partial aggregation in PG15, but it will support partial aggregation > in PG16. Just to clarify, I think you are saying: If check_partial_aggregate_support is true and the remote server version is older than the local server version, postgres_fdw checks if the partial aggregate function exists on the remote server during planning and only uses it if it does. I tried to phrase it in a positive way, and mentioned the plan time distinction. Also, I am sorry I was away for most of July and am just getting to this. Thanks for your comment. In the documentation, the description of check_partial_aggregate_support is as follows (please see postgres-fdw.sgml). -- check_partial_aggregate_support (boolean) Only if this option is true, during query planning, postgres_fdw connects to the remote server and check if the remote server version is older than the local server version. If so, postgres_fdw assumes that for each built-in aggregate function, the partial aggregate function is not defined on the remote server unless the partial aggregate function and the aggregate function match. The default is false. -- Thursday, 20 July 2023 19:23 Alexander Pyhalov : fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43: > Hi Mr.Pyhalov, hackers. > 3) > I modified the patch to safely do a partial aggregate pushdown for > queries which contain having clauses. > Hi. Sorry, but I don't see how it could work. We apologize for any inconvenience caused. Thanks to Pyhalov's and Jim's comments, I have realized that I have made a fundamental mistake regarding the pushdown of the HAVING clause and the difficulty of achieving it performing Partial aggregate pushdown. So, I removed the codes about pushdown of the HAVING clause performing Partial aggregate pushdown. Thursday, 20 July 2023 19:23 Alexander Pyhalov : As for changes in planner.c (setGroupClausePartial()) I have several questions. 1) Why don't we add non_group_exprs to pathtarget->exprs when partial_target->exprs is not set? 2) We replace extra->partial_target->exprs with partial_target->exprs after processing. Why are we sure that after this tleSortGroupRef is correct? Response to 1) The code you pointed out was unnecessary. I have removed this code. Also, the process of adding PlaceHolderVar's expr to partial_target was missing. So I fixed this. Response to 2) The making procedures extra->groupClausePartial and extra->partial_target in make_partial_grouping_target for this patch is as follows. STEP1. From grouping_target->exprs, extract Aggref, Var and Placeholdervar that are not included in Aggref. STEP2. setGroupClausePartial sets the copy of original groupClause to extra->groupClausePartial and sets the copy of original partial_target to extra->partial_target. STEP3. setGroupClausePartial adds Var and Placeholdervar in STEP1 to partial_target. The sortgroupref of partial_target->sortgrouprefs to be added to value is set to (the maximum value of the existing sortgroupref) + 1. setGroupClausePartial adds data sgc of sortgroupclause type where sgc->tlesortgroupref matches the sortgroupref to GroupClause. STEP4. add_new_columns_to_pathtarget adds STEP1's Aggref to partial_target. Due to STEP2, the list of tlesortgrouprefs set in extra->groupClausePartial is not duplicated. Do you mean that extra->partial_target->sortgrouprefs is not replaced, and so we preserve tlesortgroupref numbers? I'm suspicious about rewriting extra->partial_target->exprs with partial_target->exprs - I'm still not sure why we don't we loose information, added by add_column_to_pathtarget() to extra->partial_target->exprs? Also look at the following example. EXPLAIN VERBOSE SELECT count(*) , (b/2)::numeric FROM pagg_tab GROUP BY b/2 ORDER BY 1; QUERY PLAN --
postgres_fdw could support row comparison pushdown
Hi. postgres_fdw currently doesn't handle RowCompareExpr, which doesn't allow keyset pagination queries to be efficiently executed over sharded table. Attached patch adds handling of RowCompareExpr in deparse.c, so that we could push down conditions like WHERE (created, id) > ('2023-01-01 00:00:00'::timestamp, 12345) to the foreign server. I'm not sure about conditions when it's possible for RowCompareExpr to have opnos with different names or namespaces, but comment in ruleutils.c suggests that this is possible, so I've added check for this in foreign_expr_walker(). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 655148c85768afbbfc034e6f5dc5a5a6d72139b8 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Aug 2023 11:47:31 +0300 Subject: [PATCH] postgres_fdw: support RowCompareExpr pushdown --- contrib/postgres_fdw/deparse.c| 139 ++ .../postgres_fdw/expected/postgres_fdw.out| 49 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 9 ++ 3 files changed, 197 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 09d6dd60ddc..5eed3ac981c 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -166,6 +166,7 @@ static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context); static void deparseNullTest(NullTest *node, deparse_expr_cxt *context); static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context); +static void deparseRowCompareExpr(RowCompareExpr *node, deparse_expr_cxt *context); static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod, @@ -287,6 +288,51 @@ is_foreign_expr(PlannerInfo *root, return true; } +/* + * Determines if the names and namespaces of operations match + */ +static bool +opnames_match(List *opnos) +{ + Oid oprns = InvalidOid; + char *oprname = NULL; + ListCell *lc; + bool match = true; + + foreach(lc, opnos) + { + HeapTuple opertup; + Form_pg_operator operform; + Oid opno = lfirst_oid(lc); + + opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno)); + if (!HeapTupleIsValid(opertup)) + elog(ERROR, "cache lookup failed for operator %u", opno); + operform = (Form_pg_operator) GETSTRUCT(opertup); + /* First op */ + if (oprname == NULL) + { + oprname = pstrdup(NameStr(operform->oprname)); + oprns = operform->oprnamespace; + } + else + { + Assert(OidIsValid(oprns)); + if (oprns != operform->oprnamespace || (strcmp(oprname, NameStr(operform->oprname)) != 0)) +match = false; + } + ReleaseSysCache(opertup); + + if (!match) + break; + } + + if (oprname) + pfree(oprname); + + return match; +} + /* * Check if expression is safe to execute remotely, and return true if so. * @@ -872,6 +918,44 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_RowCompareExpr: + { +RowCompareExpr *rce = (RowCompareExpr *) node; +ListCell *lc; + +if (list_length(rce->opnos) == 0) + return false; + +/* + * Only shippable operators can be sent to remote. + */ +foreach(lc, rce->opnos) +{ + if (!is_shippable(lfirst_oid(lc), OperatorRelationId, fpinfo)) + return false; +} + +/* If opnos names do not match, can't deparse such expression */ +if (!opnames_match(rce->opnos)) + return false; + +/* + * Recurse to arguments + */ +if (!foreign_expr_walker((Node *) rce->largs, + glob_cxt, _cxt, case_arg_cxt)) + return false; + +if (!foreign_expr_walker((Node *) rce->rargs, + glob_cxt, _cxt, case_arg_cxt)) + return false; + +/* Output is always boolean and so noncollatable. */ +collation = InvalidOid; +state = FDW_COLLATE_NONE; + + } + break; case T_List: { List *l = (List *) node; @@ -2785,6 +2869,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context) case T_ArrayExpr: deparseArrayExpr((ArrayExpr *) node, context); break; + case T_RowCompareExpr: + deparseRowCompareExpr((RowCompareExpr *) node, context); + break; case T_Aggref: deparseAggref((Aggref *) node, context); break; @@ -3508,6 +3595,58 @@ deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context) deparse_type_name(node->array_typeid, -1)); } +/* + * Deparse RowCompareExpr + */ +static void +deparseRowCompareExpr(RowCompareExpr *node, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool first; + ListCell *arg; + HeapTuple opertup; + Form_pg_operator operform; + Oid opno = linitial_oid(node->opnos); + + /* Deparse the first argument */ + appendStringInfoString(buf, "(ROW("); + + first = true; + foreach(arg, node->largs) + { + if (!firs
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43: Hi Mr.Pyhalov, hackers. 3) I modified the patch to safely do a partial aggregate pushdown for queries which contain having clauses. Hi. Sorry, but I don't see how it could work. For example, the attached test returns wrong result: CREATE FUNCTION f() RETURNS INT AS $$ begin return 10; end $$ LANGUAGE PLPGSQL; SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1; b | sum +- 0 | 0 10 | 0 20 | 0 30 | 0 40 | 0 +(5 rows) In fact the above query should have returned 0 rows, as SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1; b | sum +-- 0 | 600 1 | 660 2 | 720 3 | 780 4 | 840 5 | 900 6 | 960 7 | 1020 8 | 1080 9 | 1140 10 | 600 11 | 660 12 | 720 shows no such rows. Or, on the same data SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 1; You'll get 0 rows. But SELECT b, sum(a) FROM pagg_tab GROUP BY b; b | sum +-- 42 | 720 29 | 1140 4 | 840 34 | 840 41 | 660 0 | 600 40 | 600 gives. The issue is that you can't calculate "partial" having. You should compare full aggregate in filter, but it's not possible on the level of one partition. And you have this in plans Finalize GroupAggregate Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*) Group Key: pagg_tab.b Filter: (sum(pagg_tab.a) < 700) -> Sort Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a)) Sort Key: pagg_tab.b -> Append -> Foreign Scan Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a)) Filter: ((PARTIAL sum(pagg_tab.a)) < 700) <--- here we can't compare anything yet, sum is incomplete. Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1 -> Foreign Scan Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_1.a)) Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700) Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1 -> Foreign Scan Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_2.a)) Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700) Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1 In foreign_grouping_ok() 6586 if (IsA(expr, Aggref)) 6587 { 6588 if (partial) 6589 { 6590 mark_partial_aggref((Aggref *) expr, AGGSPLIT_INITIAL_SERIAL); 6591 continue; 6592 } 6593 else if (!is_foreign_expr(root, grouped_rel, expr)) 6594 return false; 6595 6596 tlist = add_to_flat_tlist(tlist, list_make1(expr)); 6597 } at least you shouldn't do anything with expr, if is_foreign_expr() returned false. If we restrict pushing down queries with havingQuals, I'm not quite sure how Aggref can appear in local_conds. As for changes in planner.c (setGroupClausePartial()) I have several questions. 1) Why don't we add non_group_exprs to pathtarget->exprs when partial_target->exprs is not set? 2) We replace extra->partial_target->exprs with partial_target->exprs after processing. Why are we sure that after this tleSortGroupRef is correct? -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 1824cb67fe9..c6f613019c3 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -3025,6 +3025,22 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1; SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1; +CREATE FUNCTION f() RETURNS INT AS $$ +begin + return 10; +end +$$ LANGUAGE PLPGSQL; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT b, sum(a) FROM
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-10 10:35: I have modified the program except for the point "if the version of the remote server is less than PG17". Instead, we have addressed the following. "If check_partial_aggregate_support is true and the remote server version is older than the local server version, postgres_fdw does not assume that the partial aggregate function is on the remote server unless the partial aggregate function and the aggregate function match." The reason for this is to maintain compatibility with any aggregate function that does not support partial aggregate in one version of V1 (V1 is PG17 or higher), even if the next version supports partial aggregate. For example, string_agg does not support partial aggregation in PG15, but it will support partial aggregation in PG16. Hi. 1) In foreign_join_ok() should we set fpinfo->user if fpinfo->check_partial_aggregate_support is set like it's done for fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user = NULL if use_remote_estimate is not set. 2) It seeems we found an additional issue with original patch, which is present in current one. I'm attaching a patch which seems to fix it, but I'm not quite sure in it. We have not been able to add a test for the case where the remote server version is older than the local server version to the regression test. Is there any way to add such tests to the existing regression tests? -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 187d15185200aabc22c5219bbe636bc950670a78 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Fri, 14 Jul 2023 16:34:02 +0300 Subject: [PATCH] For partial aggregation we can't rely on the fact that every var is a part of some GROUP BY expression --- .../postgres_fdw/expected/postgres_fdw.out| 120 ++ contrib/postgres_fdw/postgres_fdw.c | 31 +++-- contrib/postgres_fdw/sql/postgres_fdw.sql | 9 ++ 3 files changed, 152 insertions(+), 8 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 8d80ba0a6be..31ee461045c 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9955,6 +9955,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1; 49 | 19. | 29 |60 (50 rows) +EXPLAIN (VERBOSE, COSTS OFF) +SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1; + QUERY PLAN + + Sort + Output: ((pagg_tab.b / 2)), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*)) + Sort Key: ((pagg_tab.b / 2)) + -> Finalize HashAggregate + Output: ((pagg_tab.b / 2)), avg(pagg_tab.a), max(pagg_tab.a), count(*) + Group Key: ((pagg_tab.b / 2)) + -> Append + -> Foreign Scan + Output: ((pagg_tab.b / 2)), (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)) + Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab) + Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p1 GROUP BY 1 + -> Foreign Scan + Output: ((pagg_tab_1.b / 2)), (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)) + Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1) + Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p2 GROUP BY 1 + -> Foreign Scan + Output: ((pagg_tab_2.b / 2)), (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)) + Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2) + Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p3 GROUP BY 1 +(19 rows) + +SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1; + ?column? | avg | max | count +--+-+-+--- +0 | 10.5000 | 21 | 120 +1 | 12.5000 | 23 | 120 +2 | 14.5000 | 25 | 120 +3 | 16.5000 | 27 | 120 +4 | 18.5000 | 29 | 120 +5 | 10.5000 | 21 | 120 +6 | 12.5000 | 23 | 120 +7 | 14.5000 | 25 | 120 +8 | 16.5000 | 27 | 120 +9 | 18.5000 | 29 | 120 + 10 | 10.5000 | 21 | 120 + 11 | 12.5000 | 23 | 120 + 12 | 14.5000 | 25 | 120
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2023-07-13 05:27: On Mon, Mar 27, 2023 at 01:28:24PM +0300, Alexander Pyhalov wrote: Justin Pryzby писал 2023-03-26 17:51: > On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote: > > This currently handles partitions with a loop around the whole CIC > > implementation, which means that things like WaitForLockers() happen > > once for each index, the same as REINDEX CONCURRENTLY on a partitioned > > table. Contrast that with ReindexRelationConcurrently(), which handles > > all the indexes on a table in one pass by looping around indexes within > > each phase. > > Rebased over the progress reporting fix (27f5c712b). > > I added a list of (intermediate) partitioned tables, rather than looping > over the list of inheritors again, to save calling rel_get_relkind(). > > I think this patch is done. Overall looks good to me. However, I think that using 'partitioned' as list of partitioned index oids in DefineIndex() is a bit misleading - we've just used it as boolean, specifying if we are dealing with a partitioned relation. Right. This is also rebased on 8c852ba9a4 (Allow some exclusion constraints on partitions). Hi. I have some more question. In the following code (indexcmds.c:1640 and later) 1640 rel = table_open(relationId, ShareUpdateExclusiveLock); 1641 heaprelid = rel->rd_lockInfo.lockRelId; 1642 table_close(rel, ShareUpdateExclusiveLock); 1643 SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); should we release ShareUpdateExclusiveLock before getting session lock in DefineIndexConcurrentInternal()? Also we unlock parent table there between reindexing childs in the end of DefineIndexConcurrentInternal(): 1875 /* 1876 * Last thing to do is release the session-level lock on the parent table. 1877 */ 1878 UnlockRelationIdForSession(, ShareUpdateExclusiveLock); 1879 } Is it safe? Shouldn't we hold session lock on the parent table while rebuilding child indexes? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: memory leak in trigger handling (since PG12)
Tomas Vondra писал 2023-06-22 17:16: On 6/22/23 13:46, Tomas Vondra wrote: ... I haven't tried the reproducer, but I think I see the issue - we store the bitmap as part of the event to be executed later, but the bitmap is in per-tuple context and gets reset. So I guess we need to copy it into the proper long-lived context (e.g. AfterTriggerEvents). I'll get that fixed. Alexander, can you try if this fixes the issue for you? regard Hi. The patch fixes the problem and looks good to me. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: memory leak in trigger handling (since PG12)
Tomas Vondra писал 2023-05-25 17:41: The attached patch does this - I realized we actually have estate in ExecGetAllUpdatedCols(), so we don't even need a variant with a different signature. That makes the patch much simpler. The question is whether we need the signature anyway. There might be a caller expecting the result to be in CurrentMemoryContext (be it ExecutorState or something else), and this change might break it. But I'm not aware of any callers, so maybe that's fine. Hi. Unfortunately, this patch has broken trigdata->tg_updatedcols usage in AFTER UPDATE triggers. Should it be if not fixed, but at least mentioned in documentation? Attaching sample code. After creating trigger, an issue can be reproduced as this: create table test (i int, j int); create function report_update_fields() RETURNS TRIGGER AS '/location/to/trig_test.so' language C; create trigger test_update after update ON test FOR EACH ROW EXECUTE FUNCTION report_update_fields(); insert into test values (1, 12); update test set j=2; -- Best regards, Alexander Pyhalov, Postgres Professional#include #include #include #include #include #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif void _PG_init() { } PG_FUNCTION_INFO_V1(report_update_fields); Datum report_update_fields(PG_FUNCTION_ARGS) { TriggerData *trigdata = (TriggerData *) fcinfo->context; int tgnargs, ret; char *rgids, *gtname, *nspname, *rname; HeapTuple rettuple = NULL; List *rgidlist = NIL; ListCell *lc; if (!CALLED_AS_TRIGGER(fcinfo) || /*!TRIGGER_FIRED_AFTER(trigdata->tg_event) || */ TRIGGER_FIRED_FOR_STATEMENT(trigdata->tg_event) || !TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event)) elog(ERROR, "usage unsupported"); if ((trigdata->tg_trigtuple != NULL) && (trigdata->tg_newtuple != NULL)) { int attnum = -1; int num_upd_attrs = 0; while ((attnum = bms_next_member(trigdata->tg_updatedcols, attnum)) >= 0) if (attnum + FirstLowInvalidHeapAttributeNumber > 0) num_upd_attrs++; elog(INFO, "updated %d attrs", num_upd_attrs); } }
Re: Partial aggregates pushdown
Bruce Momjian писал 2023-06-20 03:42: Apologies for the delay in my reply to this email. I looked into the existing code and I found three things: 1) PQserverVersion() just pulls the conn->sversion value from the existing connection because pqSaveParameterStatus() pulls the server_version sent by the backend; no need to issue SELECT version(). 2) postgres_fdw already has nine calls to GetConnection(), and only opens a connection if it already doesn't have one. Here is an example: /* Get the remote estimate */ conn = GetConnection(fpinfo->user, false, NULL); get_remote_estimate(sql.data, conn, , , _cost, _cost); ReleaseConnection(conn); Therefore, it seems like it would be near-zero cost to just call conn = GetConnection() and then PQserverVersion(conn), and ReleaseConnection(). You can then use the return value of PQserverVersion() to determine if you can push down partial aggregates. Hi. Currently we don't get remote connection while planning if use_remote_estimate is not set. Such change would require to get remote connection in planner, not in executor. This can lead to change of behavior (like errors in explain when user mapping is wrong - e.g. bad password is specified). Also this potentially can lead to establishing connections even when plan node is not actually used (like extreme example - select sum(score) from t limit 0). I'm not saying we shouldn't do it - just hint at possible consequences. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Hi. + An aggregate function, called the partial aggregate function for partial aggregate + that corresponding to the aggregate function, is defined on the primary node and + the postgres_fdw node. Something is clearly wrong here. + When using built-in sharding feature in PostgreSQL is used, And here. Overall the code looks good to me, but I suppose that documentation needs further review from some native speaker. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-08 02:08: From: Alexander Pyhalov Sent: Wednesday, June 7, 2023 6:47 PM This seems to be more robust, but the interface became more strange. I'm not sure what to do with it. Some ideas I had to avoid introducing this parameter. Not sure I like any of them. 1) You can use QualifiedNameGetCreationNamespace() for aggpartialfnName and still compare namespace and function name for it and aggName, aggNamespace. Seems to be not ideal, but avoids introducing new parameters. 2) You can lookup for partial aggregate function after ProcedureCreate() in AggregateCreate(), if it wasn't found at earlier stages. If it is the aggregate itself - check it. If it's still not found, error out. Also seems to be a bit ugly - you leave uncommitted garbage for vacuum in catalogue. Thank you for suggesting alternatives. The disadvantages of alternative 2) appear to be undesirable, I have modified it according to alternative 1) Another issue - the patch misses recording dependency between aggpartialfn and aggregate procedure. I added code to record dependencys between aggpartialfn and aggregate procedure, similar to the code for functions such as combinefunc. Hi. Looks better. The only question I have is should we record dependency between procOid and aggpartialfn if aggpartialfn == procOid. Also it seems new code likely should be run through pgindent. doc/src/sgml/postgres-fdw.sgml: + For WHERE clauses, + JOIN clauses, this sending is active if + conditions in linkend="postgres-fdw-remote-query-optimization"/> + hold and enable_partitionwise_join is true(this condition + is need for only JOIN clauses). + For aggregate expressions, this sending is active if conditions in No space between "true" and "(" in "is true(this condition". Some sentences in documentation, like one starting with "For aggregate expressions, this sending is active if conditions in..." seem to be too long, but I'm not the best man to read out documentation. In "Built-in sharding in PostgreSQL" term "shard" doesn't have a definition. By the way, I'm not sure that "sharding" documentation belongs to this patch, at least it needs a review from native speaker. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 15:31: Thanks for the explanation. I understand that the method of comparing two function name strings is incorrect. Instead, I added the parameter isaggpartialfunc indicating whether the aggregate function and its aggpartialfunc are the same or different. Hi. This seems to be more robust, but the interface became more strange. I'm not sure what to do with it. Some ideas I had to avoid introducing this parameter. Not sure I like any of them. 1) You can use QualifiedNameGetCreationNamespace() for aggpartialfnName and still compare namespace and function name for it and aggName, aggNamespace. Seems to be not ideal, but avoids introducing new parameters. 2) You can lookup for partial aggregate function after ProcedureCreate() in AggregateCreate(), if it wasn't found at earlier stages. If it is the aggregate itself - check it. If it's still not found, error out. Also seems to be a bit ugly - you leave uncommitted garbage for vacuum in catalogue. Another issue - the patch misses recording dependency between aggpartialfn and aggregate procedure. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 06:08: Hi Mr.Pyhalov. Thank you for your always thoughtful review. From: Alexander Pyhalov Sent: Monday, June 5, 2023 6:00 PM Have found one issue - src/backend/catalog/pg_aggregate.c 585 if(strcmp(strVal(linitial(aggpartialfnName)), aggName) == 0){ 586 if(((aggTransType != INTERNALOID) && (finalfn != InvalidOid)) 587 || ((aggTransType == INTERNALOID) && (finalfn != serialfn))) 588 elog(ERROR, "%s is not its own aggpartialfunc", aggName); 589 } else { Here string comparison of aggName and aggpartialfnName looks very suspicios - it seems you should compare oids, not names (in this case, likely oids of transition function and partial aggregation function). The fact that aggregate name matches partial aggregation function name is not a enough to make any decisions. I see no problem with this string comparison. Here is the reason. The intent of this code is, to determine whether the user specifies the new aggregate function whose aggpartialfunc is itself. For two aggregate functions, If the argument list and function name match, then the two aggregate functions must match. By definition of aggpartialfunc, every aggregate function and its aggpartialfn must have the same argument list. Thus, if aggpartialfnName and aggName are equal as strings, I think it is correct to determine that the user is specifying the new aggregate function whose aggpartialfunc is itself. However, since the document does not state these intentions I think your suspicions are valid. Therefore, I have added a specification to the document reflecting the above intentions. Hi. Let me explain. Look at this example, taken from test. CREATE AGGREGATE udf_avg_p_int4(int4) ( sfunc = int4_avg_accum, stype = _int8, combinefunc = int4_avg_combine, initcond = '{0,0}' ); CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); Now, let's create another aggregate. # create schema test ; create aggregate test.udf_avg_p_int4(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); ERROR: udf_avg_p_int4 is not its own aggpartialfunc What's the difference between test.udf_avg_p_int4(int4) aggregate and udf_sum(int4)? They are essentially the same, but second one can't be defined. Also note difference: # CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = pg_catalog.int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); CREATE AGGREGATE # CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = pg_catalog.int4_avg_combine, initcond = '{0,0}', aggpartialfunc = public.udf_avg_p_int4 ); ERROR: aggpartialfnName is invalid It seems that assumption about aggpartialfnName - that it's a non-qualified name is incorrect. And if we use qualified names, we can't compare just names, likely we should compare oids. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Bruce Momjian писал 2023-06-05 19:26: On Mon, Jun 5, 2023 at 12:00:27PM +0300, Alexander Pyhalov wrote: Note that after these changes "select sum()" will fail for certain cases, when remote server version is not the latest. In other cases we tried to preserve compatibility. Should we have a switch for a foreign server to turn this optimization off? Or do we just state that users should use different workarounds if remote server version doesn't match local one? We covered this in April in this and previous emails: https://www.postgresql.org/message-id/ZDGTza4rovCa%2BN3d%40momjian.us We don't check the version number for _any_ builtin functions so why would we need to check for aggregate pushdown? Yes, these will be new functions in PG 17, we have added functions regularly in major releases and have never heard reports of problems about that. Hi. I've seen this message. But introduction of new built-in function will break requests to old servers only if this new function is used in the request (this means that query changes). However, this patch changes the behavior of old queries, which worked prior to update. This seems to be different to me. Also I see that in connection.c (configure_remote_session()), we care about old PostgreSQL versions. And now we make querying them more tricky. Is it consistent? Do you think that enable_partitionwise_aggregate is a good enough protection in this cases? In documentation I see "F.38.7. Cross-Version Compatibility postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT." Likely, this paragraph should be expanded to state that partition-wise aggregation for many functions can fail to work with old foreign servers. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-02 06:54: Hi Mr.Bruce, hackers. I updated the patch. The following is a list of comments received on the previous version of the patch and my update to them in this version of the patch. Hi. I've looked through the last version of the patch. Have found one issue - src/backend/catalog/pg_aggregate.c 585 if(strcmp(strVal(linitial(aggpartialfnName)), aggName) == 0){ 586 if(((aggTransType != INTERNALOID) && (finalfn != InvalidOid)) 587 || ((aggTransType == INTERNALOID) && (finalfn != serialfn))) 588 elog(ERROR, "%s is not its own aggpartialfunc", aggName); 589 } else { Here string comparison of aggName and aggpartialfnName looks very suspicios - it seems you should compare oids, not names (in this case, likely oids of transition function and partial aggregation function). The fact that aggregate name matches partial aggregation function name is not a enough to make any decisions. In documentation doc/src/sgml/postgres-fdw.sgml: 930postgres_fdw attempts to optimize remote queries to reduce 931the amount of data transferred from foreign servers. This is done by 932sending query WHERE clauses and ggregate expressions 933to the remote server for execution, and by not retrieving table columns that 934are not needed for the current query. 935To reduce the risk of misexecution of queries, 936WHERE clauses and ggregate expressions are not sent to 937the remote server unless they use only data types, operators, and functions 938that are built-in or belong to an extension that's listed in the foreign 939server's extensions option. 940Operators and functions in such clauses must 941be IMMUTABLE as well. there are misprints in lines 932 and 936 - missing "a" in "aggregate" expressions. Note that after these changes "select sum()" will fail for certain cases, when remote server version is not the latest. In other cases we tried to preserve compatibility. Should we have a switch for a foreign server to turn this optimization off? Or do we just state that users should use different workarounds if remote server version doesn't match local one? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: buffer refcount leak in foreign batch insert code
Michael Paquier писал 2023-04-25 04:30: On Mon, Apr 24, 2023 at 09:57:10AM +0900, Michael Paquier wrote: The attached is what I am finishing with, with a minimal regression test added to postgres_fdw. Two partitions are enough. Well, I have gone through that again this morning, and applied the fix down to 14. The buildfarm is digesting it fine. -- Michael Thank you. Sorry for the late response, was on vacation. -- Best regards, Alexander Pyhalov, Postgres Professional
buffer refcount leak in foreign batch insert code
Hi. We've found that in cases like the one attached, when we insert into foreign partition with batch_size set, buffer refcount leak is detected. The above example we see a dozen of similar messages: repro_small.sql:31: WARNING: buffer refcount leak: [14621] (rel=base/16718/16732, blockNum=54, flags=0x9380 The issue was introduced in the following commit commit b676ac443b6a83558d4701b2dd9491c0b37e17c4 Author: Tomas Vondra Date: Fri Jun 11 20:19:48 2021 +0200 Optimize creation of slots for FDW bulk inserts In this commit we avoid recreating slots for each batch. But it seems that created slots should still be cleared in the end of ExecBatchInsert(). At least the attached patch seems to fix the issue. -- Best regards, Alexander Pyhalov, Postgres ProfessionalCREATE EXTENSION postgres_fdw; DO $d$ BEGIN EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$$||current_database()||$$', port '$$||current_setting('port')||$$' )$$; END $d$; CREATE USER MAPPING FOR CURRENT_USER SERVER loopback; CREATE TABLE local_data (id int, data text); INSERT INTO local_data SELECT i, 'test'|| i FROM generate_series(1, 1) i; CREATE TABLE sharded_data (id int, data text) PARTITION BY HASH(id); CREATE TABLE sharded_data_p0 PARTITION OF sharded_data FOR VALUES WITH (modulus 4, remainder 0); CREATE TABLE sharded_data_p1_remote (id int, data text); CREATE FOREIGN TABLE sharded_data_p1 PARTITION OF sharded_data FOR VALUES WITH (modulus 4, remainder 1) SERVER loopback OPTIONS (table_name 'sharded_data_p1_remote'); CREATE TABLE sharded_data_p2_remote (id int, data text); CREATE FOREIGN TABLE sharded_data_p2 PARTITION OF sharded_data FOR VALUES WITH (modulus 4, remainder 2) SERVER loopback OPTIONS (table_name 'sharded_data_p2_remote'); CREATE TABLE sharded_data_p3_remote (id int, data text); CREATE FOREIGN TABLE sharded_data_p3 PARTITION OF sharded_data FOR VALUES WITH (modulus 4, remainder 3) SERVER loopback OPTIONS (table_name 'sharded_data_p3_remote'); insert into sharded_data select * from local_data ; delete from sharded_data; alter server loopback options (add batch_size '100'); insert into sharded_data select * from local_data ; From 12d1fbf56c1c82d4659da484c2207539f396aac9 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Fri, 21 Apr 2023 12:54:41 +0300 Subject: [PATCH] Fix buffer refcount leak The buffer refcount leak was introduced in b676ac443b6a83558d4701b2dd9491c0b37e17c4 --- src/backend/executor/nodeModifyTable.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 6aa8c03defb..14ab8382183 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -1261,6 +1261,12 @@ ExecBatchInsert(ModifyTableState *mtstate, if (canSetTag && numInserted > 0) estate->es_processed += numInserted; + + for (i = 0; i < numSlots; i++) + { + ExecClearTuple(slots[i]); + ExecClearTuple(planSlots[i]); + } } /* -- 2.34.1
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2023-03-26 17:51: On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote: This currently handles partitions with a loop around the whole CIC implementation, which means that things like WaitForLockers() happen once for each index, the same as REINDEX CONCURRENTLY on a partitioned table. Contrast that with ReindexRelationConcurrently(), which handles all the indexes on a table in one pass by looping around indexes within each phase. Rebased over the progress reporting fix (27f5c712b). I added a list of (intermediate) partitioned tables, rather than looping over the list of inheritors again, to save calling rel_get_relkind(). I think this patch is done. Hi. Overall looks good to me. However, I think that using 'partitioned' as list of partitioned index oids in DefineIndex() is a bit misleading - we've just used it as boolean, specifying if we are dealing with a partitioned relation. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Hi. Tomas Vondra писал 2023-01-19 20:49: I took a quick look at the patch. It needs a rebase, although it applies fine using patch. A couple minor comments: 1) addl_conds seems a bit hard to understand, I'd use either the full wording (additional_conds) or maybe extra_conds Renamed to additional_conds. 2) some of the lines got quite long, and need a wrap Splitted some of them. Not sure if it's enough. 3) unknown_subquery_rels name is a bit misleading - AFAIK it's the rels that can't be referenced from upper rels (per what the .h says). So they are known, but hidden. Is there a better name? Renamed to hidden_subquery_rels. These are rels, which can't be referred to from upper join levels. 4) joinrel_target_ok() needs a better comment, explaining *when* the reltarget is safe for pushdown. The conditions are on the same row, but the project style is to break after '&&'. Added comment. It seems to be a rephrasing of lower comment in joinrel_target_ok(). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f37d26d9b622767f94e89034fa8e4fccc69e358d Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH v4] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 206 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 89 +- contrib/postgres_fdw/postgres_fdw.h | 2 + contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 632 insertions(+), 81 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 473fa45bd43..1217d47050b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -180,11 +180,14 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, Index ignore_rel, List **ignore_conds, + StringInfo additional_conds, List **params_list); +static void appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool make_subquery, - Index ignore_rel, List **ignore_conds, List **params_list); + Index ignore_rel, List **ignore_conds, + StringInfo additional_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, @@ -1370,23 +1373,21 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; RelOptInfo *scanrel = context->scanrel; + StringInfoData additional_conds; /* For upper relations, scanrel must be either a joinrel or a baserel */ Assert(!IS_UPPER_REL(context->foreignrel) || IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel)); + initStringInfo(_conds); /* Construct FROM clause */ appendStringInfoString(buf, " FROM "); deparseFromExprForRel(buf, context->root, scanrel, (bms_membership(scanrel->relids) == BMS_MULTIPLE), - (Index) 0, NULL, context->params_list); - - /* Construct WHERE clause */ - if (quals != NIL) - { - appendStringInfoString(buf, " WHERE "); - appendConditions(quals, context); - } + (Index) 0, NULL, _conds, + context->params_list); + appendWhereClause(quals, _conds, context); + pfree(additional_conds.data); } /* @@ -1598,6 +1599,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context) reset_transmission_modes(nestlevel); } +/* + * Append WHERE clause, containing conditions + * from exprs and additional_conds, to context->buf. + */ +static void +appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool need_and = false; + + if (exprs != NIL || additional_conds->len > 0) + appendStringInfoString(buf, " WHERE "); + + if (exprs != NIL) + { + appendConditions(exprs, context); + need_and = true; + } + + if (additional_conds->len > 0) + { + if (need_and) + appendStringInfoString(buf, " AND "); + appendStringInfo(buf, "(%s)", additional_conds->data); + } +} + /* Output join name for given join type */ const char * get_jointype_name(JoinType jointype) @@ -1616,6 +1644,9 @@ get_jointype_name(JoinType jointype) case JOIN_FULL: return "FULL";
Re: Inconsistency in vacuum behavior
Justin Pryzby писал 2023-01-19 04:49: On Mon, Jan 16, 2023 at 08:12:18PM +0300, Nikita Malakhov wrote: Hi, Currently there is no error in this case, so additional thrown error would require a new test. Besides, throwing an error here does not make sense - it is just a check for a vacuum permission, I think the right way is to just skip a relation that is not suitable for vacuum. Any thoughts or objections? Could you check if this is consistent between the behavior of VACUUM FULL and CLUSTER ? See also Nathan's patches. Hi. Cluster behaves in a different way - it errors out immediately if relation is not owned by user. For partitioned rel it would anyway raise error later. VACUUM and VACUUM FULL behave consistently after applying Nikita's patch (for partitioned and regular tables) - issue warning "skipping TABLE_NAME --- only table or database owner can vacuum it" and return success status. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Inconsistency in vacuum behavior
Nikita Malakhov писал 2023-01-16 20:12: Hi, Currently there is no error in this case, so additional thrown error would require a new test. Besides, throwing an error here does not make sense - it is just a check for a vacuum permission, I think the right way is to just skip a relation that is not suitable for vacuum. Any thoughts or objections? No objections for not throwing an error. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Inconsistency in vacuum behavior
Nikita Malakhov писал 2023-01-16 17:26: Hi! Here's the patch that fixes this case, please check it out. The patch adds vacuum_is_permitted_for_relation() check before adding partition relation to the vacuum list, and if permission is denied the relation is not added, so it is not passed to vacuum_rel() and there are no try to acquire the lock. Cheers! Hi. The patch seems to solve the issue. Two minor questions I have: 1) should we error out if HeapTupleIsValid(part_tuple) is false? 2) comment "Check partition relations for vacuum permit" seems to be broken in some way. -- Best regards, Alexander Pyhalov, Postgres Professional
Inconsistency in vacuum behavior
Hi. We've run regress isolation tests on partitioned tables and found interesting VACUUM behavior. I'm not sure, if it's intended. In the following example, partitioned tables and regular tables behave differently: CREATE TABLE vacuum_tab (a int) PARTITION BY HASH (a); CREATE TABLE vacuum_tab_1 PARTITION OF vacuum_tab FOR VALUES WITH (MODULUS 2, REMAINDER 0); CREATE TABLE vacuum_tab_2 PARTITION OF vacuum_tab FOR VALUES WITH (MODULUS 2, REMAINDER 1); CREATE ROLE regress_vacuum_conflict; In the first session: begin; LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE; In the second: SET ROLE regress_vacuum_conflict; VACUUM vacuum_tab; WARNING: permission denied to vacuum "vacuum_tab", skipping it < hangs here, trying to lock vacuum_tab_1 In non-partitioned case second session exits after emitting warning. In partitioned case, it hangs, trying to get locks. This is due to the fact that in expand_vacuum_rel() we skip parent table if vacuum_is_permitted_for_relation(), but don't perform such check for its child. The check will be performed later in vacuum_rel(), but after vacuum_open_relation(), which leads to hang in the second session. Is it intended? Why don't we perform vacuum_is_permitted_for_relation() check for inheritors in expand_vacuum_rel()? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Hi, Yuki. Thanks for looking at this patch. fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-03 06:02: question1) > + if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) && !bms_is_member(var->varno, outerrel->relids)) It takes time for me to find in what case this condition is true. There is cases in which this condition is true for semi-join of two baserels when running query which joins more than two relations such as query2 and query3. Running queries such as query2, you maybe want to pushdown of only semi-join path of joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1) and baserel(innerrel) f_t3 because of safety deparse. So you add this condition. Becouase of this limitation, your patch can't push down subquery expression "exists (select null from f_t2 where c1 = a1.c1)" in query3. I think, it is one of difficulty points for semi-join pushdown. This is my understanding of the intent of this condition and the restrictions imposed by this condition. Is my understanding right? IIRC, planner can create semi-join, which targetlist references Vars from inner join relation. However, it's deparsed as exists and so we can't reference it from SQL. So, there's this check - if Var is referenced in semi-join target list, it can't be pushed down. You can see this if comment out this check. EXPLAIN (verbose, costs off) SELECT ft2.*, ft4.* FROM ft2 INNER JOIN (SELECT * FROM ft4 WHERE EXISTS ( SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4 ON ft2.c2 = ft4.c1 INNER JOIN (SELECT * FROM ft2 WHERE EXISTS ( SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21 ON ft2.c2 = ft21.c2 WHERE ft2.c1 > 900 ORDER BY ft2.c1 LIMIT 10; will fail with EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 Here you can see that SELECT * FROM ft2 WHERE EXISTS ( SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2) was transformed to SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2 where our exists subquery is referenced from tlist. It's fine for plan (relations, participating in semi-join, can be referenced in tlist), but is not going to work with EXISTS subquery. BTW, there's a comment in joinrel_target_ok(). It tells exactly that - 5535 if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) && !bms_is_member(var->varno, outerrel->relids)) 5536 { 5537 /* We deparse semi-join as exists() subquery, and so can't deparse references to inner rel in join target list. */ 5538 ok = false; 5539 break; 5540 } Expanded comment. question2) In foreign_join_ok > * Constructing queries representing ANTI joins is hard, hence Is this true? Is it hard to expand your approach to ANTI join pushdown? I haven't tried, so don't know. question3) You use variables whose name is "addl_condXXX" in the following code. > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s", join_sql_i.data); Does this naming mean additional literal? Is there more complehensive naming, such as "subquery_exprXXX"? The naming means additional conditions (for WHERE clause, by analogy with ignore_conds and remote_conds). Not sure if subquery_expr sounds better, but if you come with better idea, I'm fine with renaming them. question4) Although really detail, there is expression making space such as "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1". Is there reason for this difference? If not, need we use same policy for making space? Fixed. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 0f26b0841cb095b4e114984deac2b1b001368c15 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH v3] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 201 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 82 - contrib/postgres_fdw/postgres_fdw.h | 3 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 620 insertions(+), 82 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 95247656504..10d82d9f2ab 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -179,12 +179,13 @@ static void appendLimitClause(deparse_ex
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-01 05:23: Hi Mr.Pyhalov. Hi. Attaching minor fixes. I haven't proof-read all comments (but perhaps, they need attention from some native speaker). Tested it with queries from https://github.com/swarm64/s64da-benchmark-toolkit, works as expected. -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 35f2d102374..bd8a4acc112 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -3472,9 +3472,9 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) if ((aggform->aggtranstype != INTERNALOID) && (aggform->aggfinalfn == InvalidOid)) { appendFunctionName(node->aggfnoid, context); } else if(aggform->partialaggfn) { - appendFunctionName((Oid)(aggform->partialaggfn), context); + appendFunctionName(aggform->partialaggfn, context); } else { - elog(ERROR, "there in no partialaggfn %u", node->aggfnoid); + elog(ERROR, "there is no partialaggfn %u", node->aggfnoid); } ReleaseSysCache(aggtup); } @@ -3986,7 +3986,8 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, } /* - * Check that partial aggregate function of aggform exsits in remote + * Check that partial aggregate function, described by aggform, + * exists on remote server, described by fpinfo. */ static bool partial_agg_compatible(Form_pg_aggregate aggform, PgFdwRelationInfo *fpinfo)
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01: 2) Do we really have to look at pg_proc in partial_agg_ok() and deparseAggref()? Perhaps, looking at aggtranstype is enough? You are right. I fixed according to your comment. partial_agg_ok() still looks at pg_proc. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01: Hi Mr.Pyhalov. 1) In previous version of the patch aggregates, which had partialaggfn, were ok to push down. And it was a definite sign that aggregate can be pushed down. Now we allow pushing down an aggregate, which prorettype is not internal and aggfinalfn is not defined. Is it safe for all user-defined (or builtin) aggregates, even if they are generally shippable? Aggcombinefn is executed locally and we check that aggregate function itself is shippable. Is it enough? Perhaps, we could use partialagg_minversion (like aggregates with partialagg_minversion == -1 should not be pushed down) or introduce separate explicit flag? In what case partial aggregate pushdown is unsafe for aggregate which has not internal aggtranstype and has no aggfinalfn? By reading [1], I believe that if aggcombinefn of such aggregate recieves return values of original aggregate functions in each remote then it must produce same value that would have resulted from scanning all the input in a single operation. One more issue I started to think about - now we don't check partialagg_minversion for "simple" aggregates at all. Is it correct? It seems that , for example, we could try to pushdown bit_or(int8) to old servers, but it didn't exist, for example, in 8.4. I think it's a broader issue (it would be also the case already if we push down aggregates) and shouldn't be fixed here. But there is an issue - is_shippable() is too optimistic. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Hi, Yuki. 1) In previous version of the patch aggregates, which had partialaggfn, were ok to push down. And it was a definite sign that aggregate can be pushed down. Now we allow pushing down an aggregate, which prorettype is not internal and aggfinalfn is not defined. Is it safe for all user-defined (or builtin) aggregates, even if they are generally shippable? Aggcombinefn is executed locally and we check that aggregate function itself is shippable. Is it enough? Perhaps, we could use partialagg_minversion (like aggregates with partialagg_minversion == -1 should not be pushed down) or introduce separate explicit flag? 2) Do we really have to look at pg_proc in partial_agg_ok() and deparseAggref()? Perhaps, looking at aggtranstype is enough? 3) I'm not sure if CREATE AGGREGATE tests with invalid PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw tests or better should be moved to src/test/regress/sql/create_aggregate.sql, as they are not specific to postgres_fdw -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
qc=0x7ffd1a4053c0) at utility.c:1074 Later will look at it again. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2022-11-21 06:00: I finally found time to digest and integrate your changes into my local branch. This fixes the three issues you reported: FORCE_RELEASE, issue with INVALID partitions issue (for which I adapted your patch into an earlier patch in my series), and progress reporting. And rebased. Hi. Thank you for the effort. I've looked through and tested new patch a bit. Overall it looks good to me. The question I have is whether we should update pg_stat_progress_create_index in reindex_invalid_child_indexes(), when we skip valid indexes? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add semi-join pushdown to postgres_fdw
Ian Lawrence Barwick писал 2022-11-04 02:21: This entry was marked as "Needs review" in the CommitFest app but cfbot reports the patch no longer applies. We've marked it as "Waiting on Author". As CommitFest 2022-11 is currently underway, this would be an excellent time update the patch. Once you think the patchset is ready for review again, you (or any interested party) can move the patch entry forward by visiting https://commitfest.postgresql.org/40/3838/ and changing the status to "Needs review". Hi. I've rebased the patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 1ac2a9e3611f716da688c04a4ec36888f62078ce Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 7 Nov 2022 10:23:32 +0300 Subject: [PATCH] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 198 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 78 - contrib/postgres_fdw/postgres_fdw.h | 3 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 613 insertions(+), 82 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 95247656504..45885442418 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, - Index ignore_rel, List **ignore_conds, + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); +static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool make_subquery, - Index ignore_rel, List **ignore_conds, List **params_list); + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, @@ -1370,23 +1371,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; RelOptInfo *scanrel = context->scanrel; + StringInfoData addl_conds; /* For upper relations, scanrel must be either a joinrel or a baserel */ Assert(!IS_UPPER_REL(context->foreignrel) || IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel)); + initStringInfo(_conds); /* Construct FROM clause */ appendStringInfoString(buf, " FROM "); deparseFromExprForRel(buf, context->root, scanrel, (bms_membership(scanrel->relids) == BMS_MULTIPLE), - (Index) 0, NULL, context->params_list); - - /* Construct WHERE clause */ - if (quals != NIL) - { - appendStringInfoString(buf, " WHERE "); - appendConditions(quals, context); - } + (Index) 0, NULL, _conds, context->params_list); + appendWhereClause(quals, _conds, context); + pfree(addl_conds.data); } /* @@ -1598,6 +1596,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context) reset_transmission_modes(nestlevel); } +/* + * Append WHERE clause, containing conditions + * from exprs and addl_conds, to context->buf. + */ +static void +appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool need_and = false; + + if (exprs != NIL || addl_conds->len > 0) + appendStringInfoString(buf, " WHERE "); + + if (exprs != NIL) + { + appendConditions(exprs, context); + need_and = true; + } + + if (addl_conds->len > 0) + { + if (need_and) + appendStringInfoString(buf, " AND "); + appendStringInfo(buf, "(%s)", addl_conds->data); + } +} + /* Output join name for given join type */ const char * get_jointype_name(JoinType jointype) @@ -1616,6 +1641,9 @@ get_jointype_name(JoinType jointype) case JOIN_FULL: return "FULL"; + case JOIN_SEMI: + return "SEMI"; + default: /* Shouldn't come here, but protect from buggy code. */ elog(ERROR, "unsupported join type %d", jointype); @@ -1715,7 +1743,7 @@ deparseSubqueryTargetList(deparse_expr_cxt *context) */ static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, - bool use_alias, Index ignore_rel, List
Re: Add semi-join pushdown to postgres_fdw
Ashutosh Bapat писал 2022-08-29 17:12: Hi Alexander, Thanks for working on this. It's great to see FDW join pushdown scope being expanded to more complex cases. I am still figuring out the implementation. It's been a while I have looked at join push down code. But following change strikes me odd -- subquery using immutable function (can be sent to remote) PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1; EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20); - QUERY PLAN - Sort + QUERY PLAN +--- + Foreign Scan Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Sort Key: t1.c1 - -> Nested Loop Semi Join - Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Join Filter: (t1.c3 = t2.c3) - -> Foreign Scan on public.ft1 t1 - Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) - -> Materialize - Output: t2.c3 - -> Foreign Scan on public.ft2 t2 - Output: t2.c3 - Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date)) -(14 rows) + Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) + Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND (EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r1.c3 = r3.c3 ORDER BY r1."C 1" ASC NULLS LAST +(4 rows) date_in | s |1 | [0:0]={cstring} date_in which will be used to cast a test to date is not immutable. So the query should't be pushed down. May not be a problem with your patch. Can you please check? Hi. It is not related to my change and works as expected. As I see, we have expression FuncExprdate(oid = 2029, args=Var ) = Const(type date) (date(r3.c5) = '1970-01-17'::date). Function is # select proname, provolatile from pg_proc where oid=2029; proname | provolatile -+- date| i So it's shippable. -- Best regards, Alexander Pyhalov, Postgres Professional
Add semi-join pushdown to postgres_fdw
Hi. It's possible to extend deparsing in postgres_fdw, so that we can push down semi-joins, which doesn't refer to inner reltarget. This allows us to push down joins in queries like SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE date(c5) = '1970-01-17'::date); EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE date(c5) = '1970-01-17'::date); QUERY PLAN --- Foreign Scan Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 10)) AND (EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((date(r3.c5) = '1970-01-17'::date)) AND ((r1.c3 = r3.c3 Deparsing semi-joins leads to generating (text) conditions like 'EXISTS (SELECT NULL FROM inner_rel WHERE join_conds) . Such conditions are generated in deparseFromExprForRel() and distributed to nearest WHERE, where they are added to the list of and clauses. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 7833d67f69287648c4594a5508feed376427f95d Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Fri, 12 Aug 2022 15:02:24 +0300 Subject: [PATCH] postgres_fdw: add support for deparsing semi joins We deparse semi-joins as EXISTS subqueries. So, deparsing semi-join leads to generating addl_conds condition, which is then added to the uppermost JOIN's WHERE clause. --- contrib/postgres_fdw/deparse.c| 198 +--- .../postgres_fdw/expected/postgres_fdw.out| 297 -- contrib/postgres_fdw/postgres_fdw.c | 78 - contrib/postgres_fdw/postgres_fdw.h | 3 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++- 5 files changed, 613 insertions(+), 82 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index a9766f97346..fcdc679d51f 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool use_alias, - Index ignore_rel, List **ignore_conds, + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); +static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, bool make_subquery, - Index ignore_rel, List **ignore_conds, List **params_list); + Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first, @@ -1372,23 +1373,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; RelOptInfo *scanrel = context->scanrel; + StringInfoData addl_conds; /* For upper relations, scanrel must be either a joinrel or a baserel */ Assert(!IS_UPPER_REL(context->foreignrel) || IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel)); + initStringInfo(_conds); /* Construct FROM clause */ appendStringInfoString(buf, " FROM "); deparseFromExprForRel(buf, context->root, scanrel, (bms_membership(scanrel->relids) == BMS_MULTIPLE), - (Index) 0, NULL, context->params_list); - - /* Construct WHERE clause */ - if (quals != NIL) - { - appendStringInfoString(buf, " WHERE "); - appendConditions(quals, context); - } + (Index) 0, NULL, _conds, context->params_list); + appendWhereClause(quals, _conds, context); + pfree(addl_conds.data); } /* @@ -1600,6 +1598,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context) reset_transmission_modes(nestlevel); } +/* + * Append WHERE clause, containing conditions + * from exprs and addl_conds, to context->buf. + */ +static void +appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context) +{ + StringInfo buf = context->buf; + bool need_and = false; + + if (exprs != NIL || addl_conds->len >
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2022-06-28 21:33: Hi, On Thu, Feb 10, 2022 at 06:07:08PM +0300, Alexander Pyhalov wrote: I've rebased patches and tried to fix issues I've seen. I've fixed reference after table_close() in the first patch (can be seen while building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Rebased patches on the current master. They still require proper review. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 5c11849ceb2a1feb0e44dbdf30cc27de0282a659 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 28 Feb 2022 10:50:58 +0300 Subject: [PATCH 5/5] Mark intermediate partitioned indexes as valid --- src/backend/commands/indexcmds.c | 33 ++- src/test/regress/expected/indexing.out | 80 +- src/test/regress/sql/indexing.sql | 8 +++ 3 files changed, 118 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index d09f0390413..d3ced6265b6 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -3139,6 +3139,7 @@ static void ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel) { List *partitions = NIL; + List *inhpartindexes = NIL; char relkind = get_rel_relkind(relid); char *relname = get_rel_name(relid); char *relnamespace = get_namespace_name(get_rel_namespace(relid)); @@ -3193,6 +3194,17 @@ ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel) char partkind = get_rel_relkind(partoid); MemoryContext old_context; + /* Create a list of invalid inherited partitioned indexes */ + if (partkind == RELKIND_PARTITIONED_INDEX) + { + if (partoid == relid || get_index_isvalid(partoid)) +continue; + + old_context = MemoryContextSwitchTo(reindex_context); + inhpartindexes = lappend_oid(inhpartindexes, partoid); + MemoryContextSwitchTo(old_context); + } + /* * This discards partitioned tables, partitioned indexes and foreign * tables. @@ -3237,9 +3249,28 @@ ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel) Oid tableoid = IndexGetRelation(relid, false); List *child_tables = find_all_inheritors(tableoid, ShareLock, NULL); - /* Both lists include their parent relation as well as any intermediate partitioned rels */ + /* + * Both lists include their parent relation as well as any + * intermediate partitioned rels + */ if (list_length(inhoids) == list_length(child_tables)) + { index_set_state_flags(relid, INDEX_CREATE_SET_VALID); + + /* Mark any intermediate partitioned index as valid */ + foreach(lc, inhpartindexes) + { +Oid partoid = lfirst_oid(lc); + +Assert(get_rel_relkind(partoid) == RELKIND_PARTITIONED_INDEX); +Assert(!get_index_isvalid(partoid)); + +/* Can't mark an index valid without marking it ready */ +index_set_state_flags(partoid, INDEX_CREATE_SET_READY); +CommandCounterIncrement(); +index_set_state_flags(partoid, INDEX_CREATE_SET_VALID); + } + } } /* diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index a4ccae50de3..b4f1aea6fca 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -57,6 +57,8 @@ create table idxpart11 partition of idxpart1 for values from (0) to (10) partiti create table idxpart111 partition of idxpart11 default partition by range(a); create table idxpart partition of idxpart111 default partition by range(a); create table idxpart2 partition of idxpart for values from (10) to (20); +create table idxpart3 partition of idxpart for values from (30) to (40) partition by range(a); +create table idxpart31 partition of idxpart3 default; insert into idxpart2 values(10),(10); -- not unique create index concurrently on idxpart (a); -- partitioned create index concurrently on idxpart1 (a); -- partitioned and partition @@ -76,7 +78,7 @@ Partition key: RANGE (a) Indexes: "idxpart_a_idx" btree (a) "idxpart_a_idx1" UNIQUE, btree (a) INVALID -Number of partitions: 2 (Use \d+ to list them.) +Number of partitions: 3 (Use \d+ to list them.) \d idxpart1 Partitioned table "public.idxpart1" @@ -88,11 +90,59 @@ Number of partitions: 2 (Use \d+ to list them.) Partition of: idxpart FOR VALUES FROM (0) TO (10) Partition key: RANGE (a) Indexes: -"idxpart1_a_idx" btree (a) INVALID +"idxpart1_a_idx" btree (a) "idxpart1_a_idx1" btree (a) "idxpart1_a_idx2" UNIQUE, btree (a) INVALID Number of partitions: 1 (Use \d+ to list them.) +\d idxpart11 + Partitioned table "public.idxpart11" + Column | Type | Collation | Nullable | Default ++-+---+--+- + a | integer | | | + b | integer | | | + c | text| | | +Partition of: idxpart1 FOR V
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2022-06-28 21:33: Hi, On Thu, Feb 10, 2022 at 06:07:08PM +0300, Alexander Pyhalov wrote: I've rebased patches and tried to fix issues I've seen. I've fixed reference after table_close() in the first patch (can be seen while building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Thanks for finding that. The patches other than 0001 are more experimental, and need someone to check if it's even a good approach to use, so I kept them separate from the essential patch. Your latest 0005 patch (mark intermediate partitioned indexes as valid) is probably fixing a bug in my SKIPVALID patch, right ? I'm not sure whether the SKIPVALID patch should be merged into 0001, and I've been awaiting feedback on the main patch before handling progress reporting. Hi. I think it's more about fixing ReindexPartitions-to-set-indisvalid patch, as we also should mark intermediate indexes as valid when reindex succeeds. Sorry for not responding sooner. The patch saw no activity for ~11 months so I wasn't prepared to pick it up in March, at least not without guidance from a committer. Would you want to take over this patch ? I wrote it following someone's question, but don't expect that I'd use the feature myself. I can help review it or try to clarify the organization of my existing patches (but still haven't managed to work my way through your amendments to my patches). Yes, I'm glad to work on the patches, as this for us this is a very important feature. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Tomas Vondra писал 2022-03-22 15:28: On 3/22/22 01:49, Andres Freund wrote: On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote: Alexander Pyhalov писал 2022-01-17 15:26: Updated patch. Sorry, missed attachment. Needs another update: http://cfbot.cputube.org/patch_37_3369.log Marked as waiting on author. TBH I'm still not convinced this is the right approach. I've voiced this opinion before, but to reiterate the main arguments: 1) It's not clear to me how could this get extended to aggregates with more complex aggregate states, to support e.g. avg() and similar fairly common aggregates. Hi. Yes, I'm also not sure how to proceed with aggregates with complex state. Likely it needs separate function to export their state, but then we should somehow ensure that this function exists and our 'importer' can handle its result. Note that for now we have no mechanics in postgres_fdw to find out remote server version on planning stage. 2) I'm not sure relying on aggpartialpushdownsafe without any version checks etc. is sufficient. I mean, how would we know the remote node has the same idea of representing the aggregate state. I wonder how this aligns with assumptions we do e.g. for functions etc. It seems to be not a problem for me, as for now we don't care about remote node internal aggregate state representation. We currently get just aggregate result from remote node. For aggregates with 'internal' stype we call converter locally, and it converts external result from aggregate return type to local node internal representation. Aside from that, there's a couple review comments: 1) should not remove the comment in foreign_expr_walker Fixed. 2) comment in deparseAggref is obsolete/inaccurate Fixed. 3) comment for partial_agg_ok should probably explain when we consider aggregate OK to be pushed down Expanded comment. 4) I'm not sure why get_rcvd_attinmeta comment talks about "return type bytea" and "real input type". Expanded comment. Tupdesc can be retrieved from node->ss.ss_ScanTupleSlot, and so we expect to see bytea (as should be produced by partial aggregation). But when we scan data, we get aggregate output type (which matches converter input type), so attinmeta should be fixed. If we deal with aggregate which doesn't have converter, partial_agg_ok() ensures that agg->aggfnoid return type matches agg->aggtranstype. 5) Talking about "partial" aggregates is a bit confusing, because that suggests this is related to actual "partial aggregates". But it's not. How should we call them? It's about pushing "Partial count()" or "Partial sum()" to the remote server, why it's not related to partial aggregates? Do you mean that it's not about parallel aggregate processing? 6) Can add_foreign_grouping_paths do without the new 'partial' parameter? Clearly, it can be deduced from extra->patype, no? Fixed this. 7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in CREATE AGGREGATE sgml docs. Added documentation. I'd appreciate advice on how it should be extended. 8) I don't think "serialize" in the converter functions is the right term, considering those functions are not "serializing" anything. If anything, it's the remote node that is serializing the agg state and the local not is deserializing it. Or maybe I just misunderstand where are the converter functions executed? Converter function transforms aggregate result to serialized internal representation, which is expected from partial aggregate. I mean, it converts aggregate result type to internal representation and then efficiently executes serialization code (i.e. converter(x) == serialize(to_internal(x))). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 7ad4eacf017a4fd3793f0949ef43ccc8292bf3f6 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 63 +- .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 187 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- doc/src/sgml/ref/create_aggregate.sgml| 27 +++ src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 ++- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 110 ++- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 702 insertions(+), 81 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..272e2391d7f 10064
Re: CREATE INDEX CONCURRENTLY on partitioned index
Hi. I've added 0005-Mark-intermediate-partitioned-indexes-as-valid.patch which fixed the following issues - when partitioned index is created, indexes on intermediate partitioned tables were preserved in invalid state. Also added some more tests. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 18fa3c27a3311294a7abfdc0674ef6143c65423b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 7 Feb 2022 10:28:42 +0300 Subject: [PATCH 1/5] Allow CREATE INDEX CONCURRENTLY on partitioned table 0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com Fixes: - rel was used after table_close(); - it seems childidxs shouldn't live in ind_context; - updated doc. --- doc/src/sgml/ref/create_index.sgml | 14 +-- src/backend/commands/indexcmds.c | 151 ++--- src/test/regress/expected/indexing.out | 60 +- src/test/regress/sql/indexing.sql | 18 ++- 4 files changed, 186 insertions(+), 57 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 91eaaabc90f..ffa98692430 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX -command will fail but leave behind an invalid index. This index +command will fail but leave behind an invalid index. +If this happens while creating index concurrently on a partitioned +table, the command can also leave behind valid or +invalid indexes on table partitions. The invalid index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead. The psql \d command will report such an index as INVALID: @@ -688,15 +691,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index cd30f15eba6..a34a1b133a0 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,6 +68,7 @@ /* non-export function prototypes */ +static void reindex_invalid_child_indexes(Oid indexRelationId); static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts); static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, @@ -670,17 +671,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel; if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId, if (pd->nparts != 0) flags |= INDEX_CREATE_INVALID; } + else if (concurrent && OidIsValid(parentIndexId)) + { + /* If concurrent, initially build index partitions as "invalid" */ + flags |= INDEX_CREATE_INVALID; + } if (stmt->deferrable) constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE; @@ -1174,18 +1169,30 @@ DefineIndex(Oid relationId, partdesc = RelationGetPartitionDesc(rel, true); if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0) { + /* + * Need to close the relation before recursing into children, so + * copy needed data into a longlived context. + */ + + MemoryContext ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX", + ALLOCSET_DEFAULT_SIZES); + MemoryContext oldcontext = MemoryContextSwitchTo(ind_context); int nparts = partdesc->nparts; Oid *part_oids = palloc(sizeof(Oid) * nparts); bool invalidate_parent = false; TupleDesc parentDesc; Oid *opfamOids; + char *relname; pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL, nparts); memcpy(part_oids, partdesc->oids, sizeof(
Re: postgres_fdw and skip locked
Ashutosh Bapat писал 2022-02-17 16:30: On Wed, Feb 16, 2022 at 8:38 PM Alexander Pyhalov wrote: Ashutosh Bapat писал 2022-02-16 16:40: > On Mon, Feb 14, 2022 at 4:23 PM Alexander Pyhalov > wrote: >> > I see that these options will work for all kinds of relations. So no > problem if foreign table is pointing to something other than a table. Hi. The issue is that we perform deparsing while planing, we haven't connected to server yet. Are there any ideas how to find out its version without specifying it, for example, in server options? Yes, you are right. I wish foreign servers, esp. postgresql, could be more integrated and if we could defer deparsing till execution phase. But that's out of scope for this patch. Hi. I've updated patch to provide server-level "deparse_wait_policy" option, which controls if we deparse SKIP LOCKED or SELECT FOR UPDATE (true by default). This will make possible for people with old servers to revert to old behavior. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 5f5a0434227debd8ca7ee406e8cb1997edff14a3 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 14 Feb 2022 12:01:26 +0300 Subject: [PATCH] postgres_fdw could pass lock wait policy to foreign server --- contrib/postgres_fdw/deparse.c| 12 +- .../postgres_fdw/expected/postgres_fdw.out| 110 +- contrib/postgres_fdw/option.c | 4 +- contrib/postgres_fdw/postgres_fdw.c | 4 + contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/sql/postgres_fdw.sql | 18 +++ doc/src/sgml/postgres-fdw.sgml| 16 +++ 7 files changed, 161 insertions(+), 4 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..f55afd461f0 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1429,8 +1429,7 @@ deparseLockingClause(deparse_expr_cxt *context) * For now, just ignore any [NO] KEY specification, since (a) * it's not clear what that means for a remote table that we * don't have complete information about, and (b) it wouldn't - * work anyway on older remote servers. Likewise, we don't - * worry about NOWAIT. + * work anyway on older remote servers. */ switch (rc->strength) { @@ -1451,6 +1450,15 @@ deparseLockingClause(deparse_expr_cxt *context) if (bms_membership(rel->relids) == BMS_MULTIPLE && rc->strength != LCS_NONE) appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid); + +/* Lock behavior */ +if (fpinfo->deparse_wait_policy && rc->strength != LCS_NONE) +{ + if (rc->waitPolicy == LockWaitSkip) + appendStringInfoString(buf, " SKIP LOCKED"); + else if (rc->waitPolicy == LockWaitError) + appendStringInfoString(buf, " NOWAIT"); +} } } } diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index b2e02caefe4..b33d61b8656 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -389,6 +389,35 @@ SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE; 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo (1 row) +-- test wait policy specification +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT; + QUERY PLAN +- + Foreign Scan on public.ft1 t1 + Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.* + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE NOWAIT +(3 rows) + +SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT; + c1 | c2 | c3 | c4 |c5| c6 | c7 | c8 +-++---+--+--+++- + 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo +(1 row) + +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED; + QUERY PLAN +- + Foreign Scan on public.ft1 t1 + Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.* + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE SKIP LOC
Re: postgres_fdw and skip locked
Ashutosh Bapat писал 2022-02-16 16:40: On Mon, Feb 14, 2022 at 4:23 PM Alexander Pyhalov wrote: Hi. Now select ... for update ... [skip locked|nowait] options are not pushed down to remote servers. I see the only reason is that we can speak to pre-9.5 server, which doesn't understand skip locked option. Are there any other possible issues? Should we add foreign table option to control this behavior? Should we always push these clauses if remote server's version is newer than 9.5? There are quite a few options already. It will be good not to add one more. I see that these options will work for all kinds of relations. So no problem if foreign table is pointing to something other than a table. Hi. The issue is that we perform deparsing while planing, we haven't connected to server yet. Are there any ideas how to find out its version without specifying it, for example, in server options? -- Best regards, Alexander Pyhalov, Postgres Professional
postgres_fdw and skip locked
Hi. Now select ... for update ... [skip locked|nowait] options are not pushed down to remote servers. I see the only reason is that we can speak to pre-9.5 server, which doesn't understand skip locked option. Are there any other possible issues? Should we add foreign table option to control this behavior? -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f416cb0afdf42b8ab5375e7a3ccab6e41ebb16ab Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 14 Feb 2022 12:01:26 +0300 Subject: [PATCH] postgres_fdw could pass lock wait policy to foreign server --- contrib/postgres_fdw/deparse.c| 12 ++- .../postgres_fdw/expected/postgres_fdw.out| 81 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 13 +++ 3 files changed, 104 insertions(+), 2 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..7e320be5e6a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1429,8 +1429,7 @@ deparseLockingClause(deparse_expr_cxt *context) * For now, just ignore any [NO] KEY specification, since (a) * it's not clear what that means for a remote table that we * don't have complete information about, and (b) it wouldn't - * work anyway on older remote servers. Likewise, we don't - * worry about NOWAIT. + * work anyway on older remote servers. */ switch (rc->strength) { @@ -1451,6 +1450,15 @@ deparseLockingClause(deparse_expr_cxt *context) if (bms_membership(rel->relids) == BMS_MULTIPLE && rc->strength != LCS_NONE) appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid); + +/* Lock behavior */ +if (rc->strength != LCS_NONE) +{ + if (rc->waitPolicy == LockWaitSkip) + appendStringInfoString(buf, " SKIP LOCKED"); + else if (rc->waitPolicy == LockWaitError) + appendStringInfoString(buf, " NOWAIT"); +} } } } diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index b2e02caefe4..89564d9aef5 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -389,6 +389,35 @@ SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE; 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo (1 row) +-- test wait policy specification +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT; + QUERY PLAN +- + Foreign Scan on public.ft1 t1 + Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.* + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE NOWAIT +(3 rows) + +SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT; + c1 | c2 | c3 | c4 |c5| c6 | c7 | c8 +-++---+--+--+++- + 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo +(1 row) + +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED; + QUERY PLAN +- + Foreign Scan on public.ft1 t1 + Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.* + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE SKIP LOCKED +(3 rows) + +SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED; + c1 | c2 | c3 | c4 |c5| c6 | c7 | c8 +-++---+--+--+++- + 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo +(1 row) + -- aggregate SELECT COUNT(*) FROM ft1 t1; count @@ -1862,6 +1891,32 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t 110 | 110 (10 rows) +-- test wait policy specification +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE SKIP LOCKED; +
Re: CREATE INDEX CONCURRENTLY on partitioned index
Justin Pryzby писал 2021-02-26 21:20: On Mon, Feb 15, 2021 at 10:07:05PM +0300, Anastasia Lubennikova wrote: 5) Speaking of documentation, I think we need to add a paragraph about CIC on partitioned indexes which will explain that invalid indexes may appear and what user should do to fix them. I'm not sure about that - it's already documented in general, for nonpartitioned indexes. Hi. I've rebased patches and tried to fix issues I've seen. I've fixed reference after table_close() in the first patch (can be seen while building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). It seems childidxs shouldn't live in ind_context, so I moved it out of it. Updated documentation to state that CIC can leave invalid or valid indexes on partitions if it's not succeeded. Also merged old 0002-f-progress-reporting.patch and 0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the first one didn't really fixed issue with progress report (as ReindexRelationConcurrently() uses pgstat_progress_start_command(), which seems to mess up the effect of this command in DefineIndex()). Note, that third patch completely removes attempts to report create index progress correctly (reindex reports about individual commands, not the whole CREATE INDEX). So I've added 0003-Try-to-fix-create-index-progress-report.patch, which tries to fix the mess with create index progress report. It introduces new flag REINDEXOPT_REPORT_CREATE_PART to ReindexParams->options. Given this flag, ReindexRelationConcurrently() will not report about individual operations start/stop, but ReindexMultipleInternal() will report about reindexed partitions. To make the issue worse, some partitions can be handled in ReindexPartitions() and ReindexMultipleInternal() should know how many to correctly update PROGRESS_CREATEIDX_PARTITIONS_DONE counter. Also it needs IndexOid to correctly generate pg_stat_progress_create_index record, so we pass these parameters to it. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom eaad7c3ed2fda93fdb91aea60294f60489444bf7 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 7 Feb 2022 10:28:42 +0300 Subject: [PATCH 1/4] Allow CREATE INDEX CONCURRENTLY on partitioned table 0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com Fixes: - rel was used after table_close(); - it seems childidxs shouldn't live in ind_context; - updated doc. --- doc/src/sgml/ref/create_index.sgml | 14 +-- src/backend/commands/indexcmds.c | 151 ++--- src/test/regress/expected/indexing.out | 60 +- src/test/regress/sql/indexing.sql | 18 ++- 4 files changed, 186 insertions(+), 57 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 91eaaabc90f..ffa98692430 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] If a problem arises while scanning the table, such as a deadlock or a uniqueness violation in a unique index, the CREATE INDEX -command will fail but leave behind an invalid index. This index +command will fail but leave behind an invalid index. +If this happens while creating index concurrently on a partitioned +table, the command can also leave behind valid or +invalid indexes on table partitions. The invalid index will be ignored for querying purposes because it might be incomplete; however it will still consume update overhead. The psql \d command will report such an index as INVALID: @@ -688,15 +691,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 560dcc87a2c..666ced8e1d7 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,6 +68,7 @@ /* non-export function prototypes */ +static void reindex_invalid_child_indexes(Oid indexRelationId); static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts); static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, @@ -670,17 +671,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables.
CREATE INDEX CONCURRENTLY on partitioned index
Alexander Pyhalov писал 2022-02-09 15:18: Hi. I've looked at patches, introducing CREATE INDEX CONCURRENTLY for partitioned tables - https://www.postgresql.org/message-id/flat/20210226182019.GU20769%40telsasoft.com#da169a0a518bf8121604437d9ab053b3 . The thread didn't have any activity for a year. I've rebased patches and tried to fix issues I've seen. I've fixed reference after table_close() in the first patch (can be seen while building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Also merged old 0002-f-progress-reporting.patch and 0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the first one didn't really fixed issue with progress report (as ReindexRelationConcurrently() uses pgstat_progress_start_command(), which seems to mess up the effect of this command in DefineIndex()). Also third patch completely removes attempts to report create index progress correctly (reindex reports about individual commands, not the whole CREATE INDEX). So I've added 0003-Try-to-fix-create-index-progress-report.patch, which tries to fix the mess with create index progress report. It introduces new flag REINDEXOPT_REPORT_PART to ReindexParams->options. Given this flag, ReindexRelationConcurrently() will not report about individual operations, but ReindexMultipleInternal() will report about reindexed partitions. To make the issue worse, some partitions can be handled in ReindexPartitions() and ReindexMultipleInternal() should know how many to correctly update PROGRESS_CREATEIDX_PARTITIONS_DONE counter, so we pass the number of handled partitions to it. I also have question if in src/backend/commands/indexcmds.c:1239 1240 oldcontext = MemoryContextSwitchTo(ind_context); 1239 childidxs = RelationGetIndexList(childrel); 1241 attmap = 1242 build_attrmap_by_name(RelationGetDescr(childrel), 1243 parentDesc); 1244 MemoryContextSwitchTo(oldcontext); should live in ind_context, given that we iterate over this list of oids and immediately free it, but at least it shouldn't do much harm. Sorry, messed the topic. -- Best regards, Alexander Pyhalov, Postgres Professional
Justin Pryzby
Hi. I've looked at patches, introducing CREATE INDEX CONCURRENTLY for partitioned tables - https://www.postgresql.org/message-id/flat/20210226182019.GU20769%40telsasoft.com#da169a0a518bf8121604437d9ab053b3 . The thread didn't have any activity for a year. I've rebased patches and tried to fix issues I've seen. I've fixed reference after table_close() in the first patch (can be seen while building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Also merged old 0002-f-progress-reporting.patch and 0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the first one didn't really fixed issue with progress report (as ReindexRelationConcurrently() uses pgstat_progress_start_command(), which seems to mess up the effect of this command in DefineIndex()). Also third patch completely removes attempts to report create index progress correctly (reindex reports about individual commands, not the whole CREATE INDEX). So I've added 0003-Try-to-fix-create-index-progress-report.patch, which tries to fix the mess with create index progress report. It introduces new flag REINDEXOPT_REPORT_PART to ReindexParams->options. Given this flag, ReindexRelationConcurrently() will not report about individual operations, but ReindexMultipleInternal() will report about reindexed partitions. To make the issue worse, some partitions can be handled in ReindexPartitions() and ReindexMultipleInternal() should know how many to correctly update PROGRESS_CREATEIDX_PARTITIONS_DONE counter, so we pass the number of handled partitions to it. I also have question if in src/backend/commands/indexcmds.c:1239 1240 oldcontext = MemoryContextSwitchTo(ind_context); 1239 childidxs = RelationGetIndexList(childrel); 1241 attmap = 1242 build_attrmap_by_name(RelationGetDescr(childrel), 1243 parentDesc); 1244 MemoryContextSwitchTo(oldcontext); should live in ind_context, given that we iterate over this list of oids and immediately free it, but at least it shouldn't do much harm. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom adf4242708c2d86092f7c942c7bbb6ef12e6891b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 7 Feb 2022 10:28:42 +0300 Subject: [PATCH 1/4] Allow CREATE INDEX CONCURRENTLY on partitioned table 0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com Added fix - rel was used after table_close() --- doc/src/sgml/ref/create_index.sgml | 9 -- src/backend/commands/indexcmds.c | 150 ++--- src/test/regress/expected/indexing.out | 60 +- src/test/regress/sql/indexing.sql | 18 ++- 4 files changed, 181 insertions(+), 56 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 91eaaabc90f..20c8af3e996 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -688,15 +688,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 560dcc87a2c..39b11aebc03 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,6 +68,7 @@ /* non-export function prototypes */ +static void reindex_invalid_child_indexes(Oid indexRelationId); static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts); static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, @@ -670,17 +671,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel; if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId, if (pd->nparts != 0) flags |= INDEX_CREATE_INVALID; } + else if (concurrent && OidIsValid(parentIndexId)) + { + /
Re: Foreign join search stops on the first try
Ashutosh Bapat писал 2022-01-25 17:08: This code was written long ago. So I may have some recollection errors. But AFAIR, the reasons we wanted to avoid repeated estimation/planning for the same foreign join rel were 1. If use_remote_estimate = true, we fetch EXPLAIN output from the foreign server for various pathkeys. Fetching EXPLAIN output is expensive. Irrespective of the join order being considered locally, we expect the foreign server to give us the same cost since the join is the same. So we avoid running EXPLAIN again and again. 2. If use_remote_estimate = false, the logic to estimate a foreign join locally is independent of the join order so should yield same cost again and again. For some reason that doesn't seem to be the case here. Hi. use_remote_estimate was set to false in our case, and yes, it fixed this issue. The problem is that if use_remote_estimate = false, the logic to estimate a foreign join locally is not independent from the join order. In above example, without patch we see plan with cost: cost=382.31..966.86 rows=2 width=37 If we avoid exiting on (joinrel->fdw_private), we can see in gdb the following cases, when joining all 3 relations: case 1: outerrel:relids (stock, order_line), startup_cost = 100, total_cost = 2415.92001, rel_startup_cost = 0, rel_total_cost = 2315.5, retrieved_rows = 21 innerrel: relid (district) startup_cost = 100, total_cost = 101.145001, rel_startup_cost = 0, rel_total_cost = 1.125, retrieved_rows = 1 joinrel: startup_cost = 100, total_cost = 2416.875, retrieved_rows = 2 case 2: outerrel: relids (district, order_line), startup_cost = 100, total_cost = 281.419996, rel_total_cost = 180, retrieved_rows = 71 innerrel: relid (stock), startup_cost = 100, total_cost = 683.285008, rel_startup_cost = 0, rel_total_cost = 576.625, retrieved_rows = 333 joinrel: startup_cost = 100, total_cost = 974.88, retrieved_rows = 2 So, (stock join order_line) join district has different cost from (district join order_line) join stock. On Tue, Jan 25, 2022 at 1:26 PM Alexander Pyhalov wrote: It is surprising that the planning time halves with the patch. I expected it to increase slightly since we will compute estimates thrice instead of once. I wouldn't look at estimate times here precisely (and would looked at costs). Real example where we found it had 100 times more data, but effect was the same. Here some differences in planing time could be related to restarting instances with or without patches. What is use_remote_estimate? Is it ON/OFF? Yes, it was off. If we want to proceed along this line, we should take care not to fire more EXPLAIN queries on the foreign server. You are correct. Fixed patch to avoid extensive join search when use_remote_estimate is true. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 732e89ef198c0bad713b1a18446902b0132aa72c Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 24 Jan 2022 18:28:12 +0300 Subject: [PATCH] Look through all possible foreign join orders --- contrib/postgres_fdw/postgres_fdw.c | 50 + 1 file changed, 44 insertions(+), 6 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index bf3f3d9e26e..703e5df1753 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5950,7 +5950,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root, JoinType jointype, JoinPathExtraData *extra) { - PgFdwRelationInfo *fpinfo; + PgFdwRelationInfo *fpinfo, + *oldfpinfo; ForeignPath *joinpath; double rows; int width; @@ -5960,9 +5961,11 @@ postgresGetForeignJoinPaths(PlannerInfo *root, * EvalPlanQual gets triggered. */ /* - * Skip if this join combination has been considered already. + * Skip if this join combination has been considered already and rejected + * or if this join uses remote estimates. */ - if (joinrel->fdw_private) + oldfpinfo = (PgFdwRelationInfo *) joinrel->fdw_private; + if (oldfpinfo && (!oldfpinfo->pushdown_safe || oldfpinfo->use_remote_estimate)) return; /* @@ -6002,6 +6005,12 @@ postgresGetForeignJoinPaths(PlannerInfo *root, epq_path = GetExistingLocalJoinPath(joinrel); if (!epq_path) { + if (oldfpinfo) + { +joinrel->fdw_private = oldfpinfo; +pfree(fpinfo); + } + elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found"); return; } @@ -6011,6 +6020,12 @@ postgresGetForeignJoinPaths(PlannerInfo *root, if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra)) { + if (oldfpinfo) + { + joinrel->fdw_private = oldfpinfo; + pfree(fpinfo); + } + /* Free path required for EPQ if we copied one; we don't need it now */ if (epq_path) pfree(epq_path); @@ -6044,14 +6059,37 @@ postgresGetForeignJoinPaths(PlannerI
Foreign join search stops on the first try
QUERY PLAN -- Foreign Scan (cost=100.00..974.88 rows=2 width=37) (actual time=3.399..3.400 rows=0 loops=1) Output: order_line.ol_o_id, order_line.ol_i_id, order_line.ol_d_id, stock.s_i_id, stock.s_quantity, stock.s_data, district.d_id, district.d_next_o_id, district.d_name Relations: ((public.order_line) INNER JOIN (public.district)) INNER JOIN (public.stock) Remote SQL: SELECT r1.ol_o_id, r1.ol_i_id, r1.ol_d_id, r2.s_i_id, r2.s_quantity, r2.s_data, r3.d_id, r3.d_next_o_id, r3.d_name FROM ((test.order_line r1 INNER JOIN test.district r3 ON (((r1.ol_o_id < r3.d_next_o_id)) AND ((r1.ol_o_id >= (r3.d_next_o_id - 20))) AND ((r3.d_id = 1)) AND ((r1.ol_d_id = 1 INNER JOIN test.stock r2 ON (((r1.ol_i_id = r2.s_i_id)) AND ((r2.s_quantity < 11 Planning Time: 0.928 ms Execution Time: 4.511 ms -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom e0c21e620804adbdeb6bc7d11ad606d263e63249 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 24 Jan 2022 18:28:12 +0300 Subject: [PATCH] Look through all possible foreign join orders --- .../postgres_fdw/expected/postgres_fdw.out| 10 ++-- contrib/postgres_fdw/postgres_fdw.c | 47 --- 2 files changed, 46 insertions(+), 11 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7d6f7d9e3df..1547049108c 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -1203,8 +1203,8 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t - Foreign Scan Output: t1.c1, t2.c2, t3.c3, t1.c3 - Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3) - Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1 ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 10::bigint + Relations: ((public.ft1 t1) INNER JOIN (public.ft4 t3)) INNER JOIN (public.ft2 t2) + Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 10::bigint (4 rows) SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10; @@ -5585,12 +5585,12 @@ UPDATE ft2 SET c3 = 'foo' FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1) WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1 RETURNING ft2, ft2.*, ft4, ft4.*; -- can be pushed down - QUERY PLAN + QUERY PLAN +
Re: Push down time-related SQLValue functions to foreign server
Tom Lane писал 2022-01-18 23:01: I wrote: Alexander Pyhalov writes: This means we'll translate something like explain select * from t where d > now() - '1 day'::interval; to select * from t where d > $1; Right. After thinking about that a bit more, I see that this will result in a major redefinition of what is "shippable". Right now, we do not consider the above WHERE clause to be shippable, not only because of now() but because the timestamptz-minus-interval operator is dependent on the timezone setting, which might be different at the remote. But if we evaluate that operator locally and send its result as a parameter, the objection vanishes. In fact, I don't think we even need to require the subexpression to contain only built-in functions. Its result still has to be of a built-in type, but that's a much weaker restriction. Hi. So far I have the following prototype. It seems to be working, but I think it can be enhanced. At least, some sort of caching seems to be necessary for is_stable_expr(). 1) Now expression can be either 'stable shippable' or 'shippable according to old rules'. We check if it's 'stable shippable' in foreign_expr_walker(), is_foreign_param() and deparseExpr(). All such exprs are replaced by params while deparsing. 2) contain_mutable_functions() now is calculated only for current node, if node is not considered 'stable shippable'. Is it step in the right direction or do I miss something? -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 60e6e0bf98326cb557c70a365797026e9925b7a3 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 29 Jul 2021 11:45:28 +0300 Subject: [PATCH] Push down stable expressions Stable expressions can be computed locally and sent to remote side as parameters values. --- contrib/postgres_fdw/deparse.c| 1162 + .../postgres_fdw/expected/postgres_fdw.out| 87 ++ contrib/postgres_fdw/postgres_fdw.c |9 +- contrib/postgres_fdw/postgres_fdw.h |1 + contrib/postgres_fdw/shippable.c | 74 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 22 + 6 files changed, 811 insertions(+), 544 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..33a79026574 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -158,6 +158,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context); static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node, deparse_expr_cxt *context); static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context); +static void deparseStableExpr(Expr *node, deparse_expr_cxt *context); static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context); static void deparseNullTest(NullTest *node, deparse_expr_cxt *context); static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); @@ -267,20 +268,17 @@ is_foreign_expr(PlannerInfo *root, if (loc_cxt.state == FDW_COLLATE_UNSAFE) return false; - /* - * An expression which includes any mutable functions can't be sent over - * because its result is not stable. For example, sending now() remote - * side could cause confusion from clock offsets. Future versions might - * be able to make this choice with more granularity. (We check this last - * because it requires a lot of expensive catalog lookups.) - */ - if (contain_mutable_functions((Node *) expr)) - return false; /* OK to evaluate on the remote server */ return true; } +static bool +contain_mutable_functions_checker(Oid func_id, void *context) +{ + return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE); +} + /* * Check if expression is safe to execute remotely, and return true if so. * @@ -321,616 +319,650 @@ foreign_expr_walker(Node *node, inner_cxt.collation = InvalidOid; inner_cxt.state = FDW_COLLATE_NONE; - switch (nodeTag(node)) + if (is_stable_expr(node)) + { + collation = exprCollation(node); + if (collation == InvalidOid || collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; + else + state = FDW_COLLATE_UNSAFE; + } + else { - case T_Var: - { -Var *var = (Var *) node; -/* - * If the Var is from the foreign table, we consider its - * collation (if any) safe to use. If it is from another - * table, we treat its collation the same way as we would a - * Param's collation, ie it's not safe for it to have a - * non-default collation. - */ -if (bms_is_member(var->varno, glob_cxt->relids) && - var->varlevelsup == 0) + /* + * An expression which includes any mutable functions can't be sent + * over because its result is not stable. + */ + if (check_functions_in_node(node, contain_mutable_functions_checker, NULL)) + return false; + + switch (nodeTag(node)) + { + case T_Var: { - /* Var belongs to foreign table */
Re: Push down time-related SQLValue functions to foreign server
Tom Lane писал 2022-01-18 19:53: Alexander Pyhalov writes: [ updated patch ] Thanks for updating the patch. (BTW, please attach version numbers to new patch versions, to avoid confusion.) However, before we proceed any further with this patch, I think we really ought to stop and think about the question I raised last night: why are we building a one-off feature for SQLValueFunction? Wouldn't the same parameter-substitution mechanism work for *any* stable expression that doesn't contain remote Vars? That would subsume the now() case as well as plenty of others. Hi. I think, I can extend it to allow any stable function (not just immutable/sqlvalues) in is_foreign_expr(), but not so sure about "expression". Perhaps, at top of deparseExpr() we can check that expression doesn't contain vars, params, but contains stable function, and deparse it as param. This means we'll translate something like explain select * from t where d > now() - '1 day'::interval; to select * from t where d > $1; The question is how will we reliably determine its typmod (given that I read in exprTypmod() comment "returns the type-specific modifier of the expression's result type, if it can be determined. In many cases, it can't". What do we save if we don't ship whole expression as param, but only stable functions? Allowing them seems to be more straightforward. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Push down time-related SQLValue functions to foreign server
Hi. Tom Lane писал 2022-01-18 02:08: Alexander Pyhalov writes: Perhaps in a MACRO? Changed this check to a macro, also fixed condition in is_foreign_param() and added test for it. Also fixed comment in prepare_query_params(). I took a quick look at this. I'm unconvinced that you need the TIME_RELATED_SQLVALUE_FUNCTION macro, mainly because I think testing that in is_foreign_param() is pointless. The only way we'll be seeing a SQLValueFunction in is_foreign_param() is if we decided it was shippable, so you really don't need two duplicate tests. (In the same vein, I would not bother with including a switch in deparseSQLValueFunction that knows about these opcodes explicitly. Just use the typmod field; exprTypmod() does.) Yes, sure, is_foreign_param() is called only when is_foreign_expr() is true. Simplified this part. I also find it pretty bizarre that contain_unsafe_functions isn't placed beside its one caller. Maybe that indicates that is_foreign_expr is mis-located and should be in shippable.c. More generally, it's annoying that you had to copy-and-paste all of contain_mutable_functions to make this. That creates a rather nasty maintenance hazard for future hackers, who will need to keep these widely-separated functions in sync. Not sure what to do about that though. Do we want to extend contain_mutable_functions itself to cover this use-case? I've moved logic to contain_mutable_functions_skip_sqlvalues(), it uses the same subroutines as contain_mutable_functions(). Should we instead just add one more parameter to contain_mutable_functions()? I'm not sure that it's a good idea given that contain_mutable_functions() seems to be an external interface. The test cases seem a bit overkill --- what is the point of the two nigh-identical PREPARE tests, or the GROUP BY test? If anything is broken about GROUP BY, surely it's not specific to this patch. I've removed PREPARE tests, but GROUP BY test checks foreign_grouping_ok()/is_foreign_param() path, so I think it's useful. I'm not really convinced by the premise of 0002, particularly this bit: static bool -contain_mutable_functions_checker(Oid func_id, void *context) +contain_unsafe_functions_checker(Oid func_id, void *context) { - return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE); + /* now() is stable, but we can ship it as it's replaced by parameter */ + return !(func_volatile(func_id) == PROVOLATILE_IMMUTABLE || func_id == F_NOW); } The point of the check_functions_in_node callback API is to verify the mutability of functions that are embedded in various sorts of expression nodes ... but they might not be in a plain FuncExpr node, which is the only case you'll deparse correctly. It might be that now() cannot legally appear in any of the other node types that check_functions_in_node knows about, but I'm not quite convinced of that, and even less convinced that that'll stay true as we add more expression node types. Also, if we commit this, for sure some poor soul will try to expand the logic to some other stable function that *can* appear in those contexts, and that'll be broken. The implementation of converting now() to CURRENT_TIMESTAMP seems like an underdocumented kluge, too. On the whole I'm a bit inclined to drop 0002; I'm not sure it's worth the trouble. OK. Let's drop it for now. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 9a67c52b57e0b50a3702598aa0b3e8af89569a9c Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 29 Jul 2021 11:45:28 +0300 Subject: [PATCH] SQLValue functions pushdown current_timestamp, localtimestamp and similar SQLValue functions can be computed locally and sent to remote side as parameters values. --- contrib/postgres_fdw/deparse.c| 83 - .../postgres_fdw/expected/postgres_fdw.out| 88 +++ contrib/postgres_fdw/postgres_fdw.c | 9 +- contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/shippable.c | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 22 + src/backend/optimizer/util/clauses.c | 27 +- src/include/optimizer/optimizer.h | 1 + 8 files changed, 226 insertions(+), 8 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..a398e1b2174 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -109,6 +109,15 @@ typedef struct deparse_expr_cxt appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno)) #define SUBQUERY_REL_ALIAS_PREFIX "s" #define SUBQUERY_COL_ALIAS_PREFIX "c" +#define TIME_RELATED_SQLVALUE_FUNCTION(s) \ + (s->op == SVFOP_CURRENT_TIMESTAMP || \ + s->op == SVFOP_CURRENT_TIMESTAMP_N || \ + s->op == SVFOP_CURRENT_TIME || \ + s->op == SVFOP_CURRENT_TIME_N || \ + s->op == SVFOP_LOCALTIMESTAMP || \ + s->op == SVFOP_LOCALTIMESTAMP_N
Re: Partial aggregates pushdown
Alexander Pyhalov писал 2022-01-17 15:26: Zhihong Yu писал 2022-01-17 11:43: Hi, + FdwScanPrivateConvertors + * Generate attinmeta if there are some converters: I think it would be better if converter is spelled the same way across the patch. For build_conv_list(): + if (IS_UPPER_REL(foreignrel)) You can return NIL for !IS_UPPER_REL(foreignrel) - this would save indentation for the body of the func. Hi. Updated patch. Sorry, missed attachment. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 4408f98a67872efd3c09a3bf89e7cbf88db2a8b2 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..6b12b7bf76b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7d6f7d9e3df..549bb9bae61 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9336,13 +9336,13 @@ RESET enable_partitionwise_join; -- === -- test partitionwise aggregates -- === -CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a); +C
Re: Partial aggregates pushdown
Zhihong Yu писал 2022-01-17 11:43: Hi, + FdwScanPrivateConvertors + * Generate attinmeta if there are some converters: I think it would be better if converter is spelled the same way across the patch. For build_conv_list(): + if (IS_UPPER_REL(foreignrel)) You can return NIL for !IS_UPPER_REL(foreignrel) - this would save indentation for the body of the func. Hi. Updated patch. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Asymmetric partition-wise JOIN
Andrey Lepikhov писал 2021-09-15 09:31: On 14/9/21 11:37, Andrey V. Lepikhov wrote: Thank you for this good catch! The problem was in the adjust_child_relids_multilevel routine. The tmp_result variable sometimes points to original required_outer. This patch adds new ways which optimizer can generate plans. One possible way is optimizer reparameterizes an inner by a plain relation from the outer (maybe as a result of join of the plain relation and partitioned relation). In this case we have to compare tmp_result with original pointer to realize, it was changed or not. The patch in attachment fixes this problem. Additional regression test added. I thought more and realized there isn't necessary to recurse in the adjust_child_relids_multilevel() routine if required_outer contains only normal_relids. Also, regression tests were improved a bit. Hi. The patch does not longer apply cleanly, so I rebased it. Attaching rebased version. I've looked through it once again and have several questions. 1) In adjust_appendrel_attrs_multilevel(), can it happen that child_relids is zero-length list (in this case pfree's will fail)? It seems, no, but should we at least assert this? Note that in adjust_appendrel_attrs() we add logic for nappinfos being 0. 2) In try_asymmetric_partitionwise_join() we state that 'Asymmetric join isn't needed if the append node has only one child'. This is not completely correct. Asymmetric join with one partition can be advantageous when JOIN(A, UNION(B)) is more expensive than UNION(JOIN (A, B)). The later is true, for example, when we join partitioned table having foreign partitions with another foreign table and only one partition is left. Let's take the attached case (foreign_join.sql). When list_length(append_path->subpaths) > 1 is present, we get the following plan set enable_partitionwise_join = on; explain SELECT t1.a,t2.b FROM fprt1 t1 INNER JOIN ftprt2_p1 t2 ON (t1.a = t2.b) WHERE t1.a < 250 AND t2.c like '%0004' ORDER BY 1,2; QUERY PLAN --- Sort (cost=208.65..208.69 rows=17 width=8) Sort Key: t1.a -> Hash Join (cost=202.60..208.30 rows=17 width=8) Hash Cond: (t1.a = t2.b) -> Foreign Scan on ftprt1_p1 t1 (cost=100.00..105.06 rows=125 width=4) -> Hash (cost=102.39..102.39 rows=17 width=4) -> Foreign Scan on ftprt2_p1 t2 (cost=100.00..102.39 rows=17 width=4) In case when we change it to list_length(append_path->subpaths) > 0, we get foreign join and cheaper plan: explain verbose SELECT t1.a,t2.b FROM fprt1 t1 INNER JOIN ftprt2_p1 t2 ON (t1.a = t2.b) WHERE t1.a < 250 AND t2.c like '%0004' ORDER BY 1,2; QUERY PLAN --- Sort (cost=106.15..106.19 rows=17 width=8) Output: t1.a, t2.b Sort Key: t1.a -> Foreign Scan (cost=102.26..105.80 rows=17 width=8) Output: t1.a, t2.b Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) Remote SQL: SELECT r4.a, r2.b FROM (public.fprt1_p1 r4 INNER JOIN public.fprt2_p1 r2 ON (((r4.a = r2.b)) AND ((r2.c ~~ '%0004')) AND ((r4.a < 250 -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 029f0662f5401e79468a315a658b05b2f2a4e7a6 Mon Sep 17 00:00:00 2001 From: Andrey Lepikhov Date: Mon, 17 Jan 2022 11:33:03 +0300 Subject: [PATCH] Asymmetric partitionwise join. Teach optimizer to consider partitionwise join of non-partitioned table with each partition of partitioned table. Disallow asymmetric machinery for joining of two partitioned (or appended) relations because it could cause huge consumption of CPU and memory during reparameterization of NestLoop path. Change logic of the multilevel child relids adjustment, because this feature allows the optimizer to plan in new way. --- src/backend/optimizer/path/joinpath.c| 9 + src/backend/optimizer/path/joinrels.c| 187 src/backend/optimizer/plan/setrefs.c | 17 +- src/backend/optimizer/util/appendinfo.c | 45 +- src/backend/optimizer/util/pathnode.c| 9 +- src/backend/optimizer/util/relnode.c | 19 +- src/include/optimizer/paths.h| 7 +- src/test/regress/expected/partition_join.out | 425 +++ src/test/regress/sql/partition_join.sql | 180 9 files changed, 867 insertions(+), 31 deletions(-) diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index f96fc9fd282..6531981d0d8 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo
Re: Partial aggregates pushdown
Julien Rouhaud писал 2022-01-14 15:16: Hi, On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote: I've updated patch - removed catversion dump. This version of the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3369.log === Applying patches on top of PostgreSQL commit ID 025b920a3d45fed441a0a58fdcdf05b321b1eead === === applying patch ./0001-Partial-aggregates-push-down-v07.patch patching file src/bin/pg_dump/pg_dump.c Hunk #1 succeeded at 13111 (offset -965 lines). Hunk #2 FAILED at 14167. Hunk #3 succeeded at 13228 (offset -961 lines). Hunk #4 succeeded at 13319 (offset -966 lines). 1 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author. Hi. Attaching rebased patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom d9227853b4ebeaa84ce4fd9623b2128cd6b2e494 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..6b12b7bf76b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7d6f7d9e3df..549bb9bae61 100644 --- a/contrib/pos
Re: postgres_fdw: incomplete subabort cleanup of connections used in async execution
Etsuro Fujita писал 2021-12-19 13:25: Hi, While working on [1], I noticed $SUBJECT: postgres_fdw resets the per-connection states of connections, which store async requests sent to remote servers in async_capable mode, during post-abort (pgfdw_xact_callback()), but it fails to do so during post-subabort (pgfdw_subxact_callback()). This causes a crash when re-executing a query that was aborted in a subtransaction: Hi. Looks good to me. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Daniel Gustafsson писал 2021-11-15 13:16: On 3 Nov 2021, at 15:50, Alexander Pyhalov wrote: Daniel Gustafsson писал 2021-11-03 16:45: On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: Updated and rebased patch. + state = (Int128AggState *) palloc0(sizeof(Int128AggState)); + state->calcSumX2 = false; + + if (!PG_ARGISNULL(0)) + { +#ifdef HAVE_INT128 + do_int128_accum(state, (int128) PG_GETARG_INT64(0)); +#else + do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); +#endif This fails on non-INT128 platforms as state cannot be cast to Int128AggState outside of HAVE_INT128; it's not defined there. This needs to be a PolyNumAggState no? Hi. Thank you for noticing this. It's indeed fails with pgac_cv__128bit_int=no. Updated patch. The updated patch also fails to apply now, but on the catversion.h bump. To avoid having to rebase for that I recommend to skip that part in the patch and just mention the need in the thread, any committer picking this up for commit will know to bump the catversion so there is no use in risking unneccesary conflicts. I've updated patch - removed catversion dump. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 2af16e66276938b861cf7a8db2fef967f54b800f Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 673 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index b27689d0864..a515c5662bb 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALO
Re: Partial aggregates pushdown
Daniel Gustafsson писал 2021-11-03 16:45: On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: Updated and rebased patch. + state = (Int128AggState *) palloc0(sizeof(Int128AggState)); + state->calcSumX2 = false; + + if (!PG_ARGISNULL(0)) + { +#ifdef HAVE_INT128 + do_int128_accum(state, (int128) PG_GETARG_INT64(0)); +#else + do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); +#endif This fails on non-INT128 platforms as state cannot be cast to Int128AggState outside of HAVE_INT128; it's not defined there. This needs to be a PolyNumAggState no? Hi. Thank you for noticing this. It's indeed fails with pgac_cv__128bit_int=no. Updated patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f72a3d52a2b85ad9ea5f61f8ff5c46cb50ae3ec8 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 674 insertions(+), 84 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..8cee12c1b2a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index fd141a0fa5c..80c507783e6 100644 --- a/contrib/postgres_fdw/expected/post
Re: Partial aggregates pushdown
Hi. Updated and rebased patch. Ilya Gladyshev писал 2021-11-02 00:31: Hi, On 21.10.2021 13:55, Alexander Pyhalov wrote: Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. I don't quite understand why this is restricted only to aggregates that have 'internal' state, I feel like that should be possible for any aggregate that has a function to convert its final result back to aggregate state to be pushed down. While I couldn't come up with a useful example for this, except maybe for an aggregate whose aggfinalfn is used purely for cosmetic purposes (e.g. format the result into a string), I still feel that it is an unnecessary restriction. I don't feel comfortable with it for the following reasons. - Now partial converters translate aggregate result to serialized internal representation. In case when aggregate type is different from internal state, we'd have to translate it to non-serialized internal representation, so converters should skip serialization step. This seems like introducing two kind of converters. - I don't see any system aggregates which would benefit from this. However, it doesn't seem to be complex, and if it seems to be desirable, it can be done. For now introduced check that transtype matches aggregate type (or is internal) in partial_agg_ok(). A few minor review notes to the patch: +static List *build_conv_list(RelOptInfo *foreignrel); this should probably be up top among other declarations. Moved it upper. @@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root, outer_plan); } +/* + * Generate attinmeta if there are some converters: + * they are expecxted to return BYTEA, but real input type is likely different. + */ typo in word "expecxted". Fixed. @@ -139,10 +147,13 @@ typedef struct PgFdwScanState * for a foreign join scan. */ TupleDesctupdesc;/* tuple descriptor of scan */ AttInMetadata *attinmeta;/* attribute datatype conversion metadata */ +AttInMetadata *rcvd_attinmeta;/* metadata for received tuples, NULL if + * there's no converters */ Looks like rcvd_attinmeta is redundant and you could use attinmeta for conversion metadata. Seems so, removed it. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 71de85154f9ac78e99f4ce8bf9aa341fee748609 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 674 insertions(+), 84 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..8cee12c1b2a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(
Re: Partial aggregates pushdown
Peter Eisentraut писал 2021-11-01 12:47: On 21.10.21 12:55, Alexander Pyhalov wrote: Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. It seems to me that the system should be able to determine from the existing aggregate catalog entry whether an aggregate can be pushed down. For example, it could check aggtranstype != internal and similar. A separate boolean flag should not be necessary. Hi. I think we can't infer this property from existing flags. For example, if I have avg() with bigint[] argtranstype, it doesn't mean we can push down it. We couldn't also decide if partial aggregete is safe to push down based on aggfinalfn presence (for example, it is defined for sum(numeric), but we can push it down. Or if it is, the patch should provide some guidance about how an aggregate function author should set it. Where should it be provided? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Zhihong Yu писал 2021-10-22 00:43: Hi, w.r.t. 0001-Partial-aggregates-push-down-v03.patch Hi. For partial_agg_ok(), + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + ok = false; Since SearchSysCache1() is not called yet, you can return false directly. Fixed. + if (aggform->aggpartialpushdownsafe != true) The above can be written as: if (!aggform->aggpartialpushdownsafe) Fixed. For build_conv_list(): + Oid converter_oid = InvalidOid; + + if (IsA(tlentry->expr, Aggref)) ... + } + convlist = lappend_oid(convlist, converter_oid); Do you intend to append InvalidOid to convlist (when tlentry->expr is not Aggref) ? Yes, for each tlist member (which matches fpinfo->grouped_tlist in case when foreignrel is UPPER_REL) we need to find corresponding converter. If we don't append InvalidOid, we can't find convlist member, corresponding to tlist member. Added comments to build_conv_list. Also fixed error in pg_dump.c (we selected '0' when aggpartialconverterfn was not defined in schema, but checked for '-'). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom a18e2ff8de00592797e7c3ccb8d6cd536a2e4e72 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 46 +++- .../postgres_fdw/expected/postgres_fdw.out| 185 +++- contrib/postgres_fdw/postgres_fdw.c | 204 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 - src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..fa9f487d66d 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,40 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + bool ok = true; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + ok = false; + + /* + * But if an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (ok && agg->aggtranstype == INTERNALOID) + ok = (aggform->aggpartialconverterfn != InvalidOid); + + ReleaseSysCache(aggtup); + + return ok; +} diff --git a/contrib/postgres_fdw/expected/postgres_fd
Re: Partial aggregates pushdown
Tomas Vondra писал 2021-10-19 16:25: On 10/19/21 08:56, Alexander Pyhalov wrote: Hi. Tomas Vondra писал 2021-10-15 17:56: As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. Updated patch to mark aggregates as pushdown-safe in pg_aggregates. So far have no solution for aggregates with internal aggtranstype. Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. For now the overall logic is quite simple. We now also call add_foreign_grouping_paths() for partial aggregation. In foreign_expr_walker() we check if aggregate is pushable (which means that it is simple, marked as pushable and if has 'internal' as aggtranstype, has associated converter). If it is pushable, we proceed as with usual aggregates (but forbid having pushdown). During postgresGetForeignPlan() we produce list of converters for aggregates. As converters has different input argument type from their result (bytea), we have to generate alternative metadata, which is used by make_tuple_from_result_row(). If make_tuple_from_result_row() encounters field with converter, it calls converter and returns result. For now we expect converter to have only one input and output argument. Existing converters just transform input value to internal representation and return its serialized form. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 52cd61fdb5cb5fceeacd832462468d8676f57ca6 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 49 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 195 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 ++- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 666 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..50ef1009b97 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,43 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + *
Re: Partial aggregates pushdown
Hi. Tomas Vondra писал 2021-10-15 17:56: As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. Updated patch to mark aggregates as pushdown-safe in pg_aggregates. So far have no solution for aggregates with internal aggtranstype. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 823a389caf003a21dd4c8e758f89d08ba89c5856 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 45 +++- .../postgres_fdw/expected/postgres_fdw.out| 215 +- contrib/postgres_fdw/postgres_fdw.c | 29 ++- contrib/postgres_fdw/sql/postgres_fdw.sql | 31 ++- src/backend/catalog/pg_aggregate.c| 4 +- src/backend/commands/aggregatecmds.c | 6 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 101 src/include/catalog/pg_aggregate.h| 6 +- 9 files changed, 362 insertions(+), 77 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..cf6b2d9f066 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,39 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + /* Can't process aggregates which require serialization/deserialization */ + if (agg->aggtranstype == INTERNALOID) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (aggform->aggpartialpushdownsafe != true) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 44c4367b8f9..89451e208e0 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join; -- === -- test partitionwise aggregates -- === -CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a); +CREATE TABLE pagg_tab (a int, b int, c text, d numeric) PARTITION BY RANGE(a); CREATE TABLE pagg_tab_p1 (LIKE pagg_tab); CR
Re: Partial aggregates pushdown
Tomas Vondra писал 2021-10-15 17:56: Hi Alexander, Hi. And then we should extend this for aggregates with more complex internal states (e.g. avg), by supporting a function that "exports" the aggregate state - similar to serial/deserial functions, but needs to be portable. I think the trickiest thing here is rewriting the remote query to call this export function, but maybe we could simply instruct the remote node to use a different final function for the top-level node? If we have some special export function, how should we find out that remote server supports this? Should it be server property or should it somehow find out it while connecting to the server? -- Best regards, Alexander Pyhalov, Postgres Professional
Partial aggregates pushdown
Hi. One of the issues when we try to use sharding in PostgreSQL is absence of partial aggregates pushdown. I see several opportunities to alleviate this issue. If we look at Citus, it implements aggregate, calculating internal state of an arbitrary agregate function and exporting it as text. So we could calculate internal states independently on all data sources and then finalize it, which allows to compute arbitrary aggregate. But, as mentioned in [1] thread, for some functions (like count/max/min/sum) we can just push down them. It seems easy and covers a lot of cases. For now there are still issues - for example you can't handle functions as avg() as we should somehow get its internal state or sum() variants, which need aggserialfn/aggdeserialfn. Preliminary version is attached. Is someone else working on the issue? Does suggested approach make sense? [1] https://www.postgresql.org/message-id/flat/9998c3af9fdb5f7d62a6c7ad0fcd9142%40postgrespro.ru -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f2cf87a0ba1f4e4bf3f5f9e5b10b782c51717baf Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 59 - .../postgres_fdw/expected/postgres_fdw.out| 215 +- contrib/postgres_fdw/postgres_fdw.c | 29 ++- contrib/postgres_fdw/sql/postgres_fdw.sql | 31 ++- 4 files changed, 310 insertions(+), 24 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..0e4ac0ad255 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,53 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple proctup; + Form_pg_proc procform; + const char *proname; + bool ok = false; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + /* Can't process aggregates which require serialization/deserialization */ + if (agg->aggtranstype == INTERNALOID) + return false; + + proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(proctup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + procform = (Form_pg_proc) GETSTRUCT(proctup); + + /* Only system aggregates are allowed */ + if (procform->pronamespace != PG_CATALOG_NAMESPACE) + { + ReleaseSysCache(proctup); + return false; + } + + /* + * Ensure that this partial aggregate can be evaluated directly. We check + * function name to avoid checking a dozen function variants. + */ + proname = NameStr(procform->proname); + + if (strcmp(proname, "min") == 0 || + strcmp(proname, "max") == 0 || + strcmp(proname, "count") == 0 || + strcmp(proname, "sum") == 0) + ok = true; + + ReleaseSysCache(proctup); + + return ok; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 44c4367b8f9..89451e208e0 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join; -- === -- test partitionwise aggregates -- ==
Re: Function scan FDW pushdown
Ashutosh Bapat писал 2021-06-15 16:15: Hi Alexander, Hi. The current version of the patch is based on asymetric partition-wise join. Currently it is applied after v19-0001-Asymmetric-partitionwise-join.patch from on https://www.postgresql.org/message-id/792d60f4-37bc-e6ad-68ca-c2af5cbb2...@postgrespro.ru . So far I don't know how to visualize actual function expression used in function RTE, as in postgresExplainForeignScan() es->rtable comes from queryDesc->plannedstmt->rtable, and rte->functions is already 0. The actual function expression will be part of the Remote SQL of ForeignScan node so no need to visualize it separately. We still need to create tuple description for functions in get_tupdesc_for_join_scan_tuples(), so I had to remove setting newrte->functions to NIL in add_rte_to_flat_rtable(). With rte->functions in place, there's no issues for explain. The patch will have problems when there are multiple foreign tables all on different servers or use different FDWs. In such a case the function scan's RelOptInfo will get the fpinfo based on the first foreign table the function scan is paired with during join planning. But that may not be the best foreign table to join. We should be able to plan all the possible joins. Current infra to add one fpinfo per RelOptInfo won't help there. We need something better. I suppose attached version of the patch is more mature. The patch targets only postgres FDW, how do you see this working with other FDWs? Not now. We introduce necessary APIs for other FDWs, but implementing TryShippableJoinPaths() doesn't seem straightforward. If we come up with the right approach we could use it for 1. pushing down queries with IN () clause 2. joining a small local table with a large foreign table by sending the local table rows down to the foreign server. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom d997c313daf0031b812d3fca59d338be1a4f2196 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Mon, 17 May 2021 19:19:31 +0300 Subject: [PATCH] Push join with function scan to remote server --- contrib/postgres_fdw/deparse.c| 199 ++- .../postgres_fdw/expected/postgres_fdw.out| 1095 + contrib/postgres_fdw/postgres_fdw.c | 497 +++- contrib/postgres_fdw/postgres_fdw.h |6 + contrib/postgres_fdw/sql/postgres_fdw.sql | 336 + src/backend/optimizer/path/joinpath.c | 11 + src/backend/optimizer/plan/setrefs.c |1 - src/backend/optimizer/util/relnode.c |2 + src/include/foreign/fdwapi.h |1 + 9 files changed, 2035 insertions(+), 113 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..7f08575ef60 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -151,6 +151,7 @@ static void deparseConst(Const *node, deparse_expr_cxt *context, int showtype); static void deparseParam(Param *node, deparse_expr_cxt *context); static void deparseSubscriptingRef(SubscriptingRef *node, deparse_expr_cxt *context); static void deparseFuncExpr(FuncExpr *node, deparse_expr_cxt *context); +static void deparseFuncColnames(StringInfo buf, int varno, RangeTblEntry *rte, bool qualify_col); static void deparseOpExpr(OpExpr *node, deparse_expr_cxt *context); static void deparseOperatorName(StringInfo buf, Form_pg_operator opform); static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context); @@ -1740,13 +1741,54 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, { RangeTblEntry *rte = planner_rt_fetch(foreignrel->relid, root); - /* - * Core code already has some lock on each rel being planned, so we - * can use NoLock here. - */ - Relation rel = table_open(rte->relid, NoLock); + Assert(rte->rtekind == RTE_RELATION || rte->rtekind == RTE_FUNCTION); + if (rte->rtekind == RTE_RELATION) + { + /* + * Core code already has some lock on each rel being planned, so + * we can use NoLock here. + */ + Relation rel = table_open(rte->relid, NoLock); - deparseRelation(buf, rel); + deparseRelation(buf, rel); + + table_close(rel, NoLock); + } + else if (rte->rtekind == RTE_FUNCTION) + { + RangeTblFunction *rtfunc; + deparse_expr_cxt context; + ListCell *lc; + bool first = true; + int n; + + n = list_length(rte->functions); + Assert(n >= 1); + + if (n > 1) +appendStringInfoString(buf, "ROWS FROM ("); + + foreach(lc, rte->functions) + { +if (!first) + appendStringInfoString(buf, ", "); +else + first = false; + +rtfunc = (RangeTblFunction *) lfirst(lc); + +context.root = root; +context.foreignrel = foreignrel; +context.scanrel = foreignrel; +context.buf = buf; +context.params_list = params_list; + +deparseExpr((Expr *) rtfu
Re: Defer selection of asynchronous subplans until the executor initialization stage
Etsuro Fujita писал 2021-08-30 12:52: On Mon, Aug 30, 2021 at 5:36 PM Andrey V. Lepikhov To allow async execution in a bit more cases, I modified the patch a bit further: a ProjectionPath put directly above an async-capable ForeignPath would also be considered async-capable as ForeignScan can project and no separate Result is needed in that case, so I modified mark_async_capable_plan() as such, and added test cases to the postgres_fdw regression test. Attached is an updated version of the patch. Hi. The patch looks good to me and seems to work as expected. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Push down time-related SQLValue functions to foreign server
Ranier Vilela писал 2021-08-20 14:19: Another question: For 0002 patch: + if (node->funcid == F_NOW) + { + SQLValueFunction *svf = makeNode(SQLValueFunction); + + svf->op = SVFOP_CURRENT_TIMESTAMP; + svf->type = TIMESTAMPTZOID; + svf->typmod = -1; + svf->location = -1; + + deparseSQLValueFunction(svf, context); + + return; + } It seems to me that the svf->xpr field ( SQLValueFunction *svf ) is not initialized somewhere even by deparseSQLValueFunction. If it's not really used, it should be initialized to NULL, ok? xpr field just carries node type, which will be initialized by makeNode(). -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Push down time-related SQLValue functions to foreign server
Hi. Ashutosh Bapat писал 2021-08-19 17:01: I spent some time looking at this patch. Generally it looks like a good idea. These stable functions will be evaluated at the execution time and replaced with constants. I am not sure whether the nodes saved in the param_list may not get the same treatment. Have you verified that? I'm not sure I understand you. All parameters are treated in the same way. They are evaluated in process_query_params(), real params and parameters, corresponding to our SQLValue functions. If we look at execution of something like explain verbose select * from test1 t1 where i in (select i from test1 t2 where t2.t< now() and t1.i=t2.i) ; QUERY PLAN --- Foreign Scan on public.test1 t1 (cost=100.00..243310.11 rows=930 width=20) Output: t1.i, t1.t, t1.l Filter: (SubPlan 1) Remote SQL: SELECT i, t, l FROM data.test1 SubPlan 1 -> Foreign Scan on public.test1 t2 (cost=100.00..161.29 rows=5 width=4) Output: t2.i Remote SQL: SELECT i FROM data.test1 WHERE (($1::integer = i)) AND ((t < $2::timestamp with time zone) we can see two parameters evaluated in process_query_params(), one - of T_Param type (with value of current t1.i) and one of T_SQLValueFunction type (with value of current_timestamp). Also the new node types being added to the param list is something other than Param. So it conflicts with the comment below in prepare_query_params()? /* * Prepare remote-parameter expressions for evaluation. (Note: in * practice, we expect that all these expressions will be just Params, so * we could possibly do something more efficient than using the full * expression-eval machinery for this. But probably there would be little * benefit, and it'd require postgres_fdw to know more than is desirable * about Param evaluation.) */ If we are already adding non-params to this list, then the comment is outdated? Fixed comment in the new version of the patches. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Push down time-related SQLValue functions to foreign server
Hi. Ranier Vilela писал 2021-08-19 14:01: Em qui., 19 de ago. de 2021 às 07:50, Zhihong Yu Hi, For 0001 patch: + if ((s->op != SVFOP_CURRENT_TIMESTAMP) && + (s->op != SVFOP_CURRENT_TIMESTAMP_N) && + (s->op != SVFOP_CURRENT_TIME) && ... The above check appears more than once. If extracted into a helper method, it would help reduce duplicate and make the code more readable. Perhaps in a MACRO? Changed this check to a macro, also fixed condition in is_foreign_param() and added test for it. Also fixed comment in prepare_query_params(). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 2cfd3e42cad07ed552a1eb23b06040b0f74a7f2f Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 29 Jul 2021 11:45:28 +0300 Subject: [PATCH 1/2] SQLValue functions pushdown current_timestamp, localtimestamp and similar SQLValue functions can be computed locally and sent to remote side as parameters values. --- contrib/postgres_fdw/deparse.c| 95 +- .../postgres_fdw/expected/postgres_fdw.out| 121 ++ contrib/postgres_fdw/postgres_fdw.c | 9 +- contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/shippable.c | 68 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 33 + 6 files changed, 320 insertions(+), 7 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..6c99acd0c82 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -109,6 +109,15 @@ typedef struct deparse_expr_cxt appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno)) #define SUBQUERY_REL_ALIAS_PREFIX "s" #define SUBQUERY_COL_ALIAS_PREFIX "c" +#define TIME_RELATED_SQLVALUE_FUNCTION(s) \ + (s->op == SVFOP_CURRENT_TIMESTAMP || \ + s->op == SVFOP_CURRENT_TIMESTAMP_N || \ + s->op == SVFOP_CURRENT_TIME || \ + s->op == SVFOP_CURRENT_TIME_N || \ + s->op == SVFOP_LOCALTIMESTAMP || \ + s->op == SVFOP_LOCALTIMESTAMP_N || \ + s->op == SVFOP_LOCALTIME || \ + s->op == SVFOP_LOCALTIME_N) /* * Functions to determine whether an expression can be evaluated safely on @@ -157,6 +166,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context); static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node, deparse_expr_cxt *context); static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context); +static void deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context); static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context); static void deparseNullTest(NullTest *node, deparse_expr_cxt *context); static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); @@ -273,7 +283,7 @@ is_foreign_expr(PlannerInfo *root, * be able to make this choice with more granularity. (We check this last * because it requires a lot of expensive catalog lookups.) */ - if (contain_mutable_functions((Node *) expr)) + if (contain_unsafe_functions((Node *) expr)) return false; /* OK to evaluate on the remote server */ @@ -618,6 +628,23 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_SQLValueFunction: + { +SQLValueFunction *s = (SQLValueFunction *) node; + +/* + * For now only time-related SQLValue functions are supported. + * We can push down localtimestamp and localtime as we + * compute them locally. + */ +if (!TIME_RELATED_SQLVALUE_FUNCTION(s)) + return false; + +/* Timestamp or time are not collatable */ +collation = InvalidOid; +state = FDW_COLLATE_NONE; + } + break; case T_BoolExpr: { BoolExpr *b = (BoolExpr *) node; @@ -1031,6 +1058,14 @@ is_foreign_param(PlannerInfo *root, case T_Param: /* Params always have to be sent to the foreign server */ return true; + case T_SQLValueFunction: + { +SQLValueFunction *s = (SQLValueFunction *) expr; + +if (TIME_RELATED_SQLVALUE_FUNCTION(s)) + return true; +break; + } default: break; } @@ -2603,6 +2638,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context) case T_RelabelType: deparseRelabelType((RelabelType *) node, context); break; + case T_SQLValueFunction: + deparseSQLValueFunction((SQLValueFunction *) node, context); + break; case T_BoolExpr: deparseBoolExpr((BoolExpr *) node, context); break; @@ -3092,6 +3130,61 @@ deparseRelabelType(RelabelType *node, deparse_expr_cxt *context) node->resulttypmod)); } +/* + * Deparse a SQLValueFunction node + */ +static void +deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context) +{ + int32 typmod = -1; + + switch (node->op) + { + case SVFOP_LOCALTIME: + case SVFOP_CURRENT_TIME: + case SVFOP_LOCALTIMESTAMP: + cas
Re: Push down time-related SQLValue functions to foreign server
Zhihong Yu писал 2021-08-19 13:22: Hi, For 0002 patch: + /* now() is stable, but we can ship it as it's replaced by parameter */ + return !(func_volatile(func_id) == PROVOLATILE_IMMUTABLE || func_id == F_NOW); Did you mean to say 'now() is unstable' ? No, it's stable, not immutable, so we need additional check. -- Best regards, Alexander Pyhalov, Postgres Professional
Push down time-related SQLValue functions to foreign server
Hi. The attached patches allow pushing down current_timestamp/localtimestamp/current_time/localtime and now() to remote PostgreSQL server as locally computed parameters. The idea is based on oracle_fdw behavior. Examples. \d test Foreign table "public.test" Column | Type | Collation | Nullable | Default | FDW options +--+---+--+-+--- i | integer | | | | (column_name 'i') t | timestamp with time zone | | | | (column_name 't') Server: loopback FDW options: (schema_name 'data', table_name 'test') Prior the patch: explain verbose select * from test where t=current_timestamp; QUERY PLAN - Foreign Scan on public.test (cost=100.00..188.12 rows=11 width=12) Output: i, t Filter: (test.t = CURRENT_TIMESTAMP) Remote SQL: SELECT i, t FROM data.test explain verbose update test set t=current_timestamp where t -> Foreign Scan on public.test (cost=100.00..154.47 rows=414 width=50) Output: CURRENT_TIMESTAMP, ctid, test.* Filter: (test.t < now()) Remote SQL: SELECT i, t, ctid FROM data.test FOR UPDATE After patch: explain verbose select * from test where t=current_timestamp; QUERY PLAN - Foreign Scan on public.test (cost=100.00..144.35 rows=11 width=12) Output: i, t Remote SQL: SELECT i, t FROM data.test WHERE ((t = $1::timestamp with time zone)) explain verbose update test set t=current_timestamp where t -> Foreign Update on public.test (cost=100.00..137.93 rows=414 width=50) Remote SQL: UPDATE data.test SET t = $1::timestamp with time zone WHERE ((t < $1::timestamp with time zone)) -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 0846ba1d3a5f15bbea449b39741f08558fdb2d49 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 29 Jul 2021 11:45:28 +0300 Subject: [PATCH 1/2] SQLValue functions pushdown current_timestamp, localtimestamp and similar SQLValue functions can be computed locally and sent to remote side as parameters values. --- contrib/postgres_fdw/deparse.c| 100 +++- .../postgres_fdw/expected/postgres_fdw.out| 108 ++ contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/shippable.c | 68 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 29 + 5 files changed, 305 insertions(+), 1 deletion(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..01748835fc8 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -157,6 +157,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context); static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node, deparse_expr_cxt *context); static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context); +static void deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context); static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context); static void deparseNullTest(NullTest *node, deparse_expr_cxt *context); static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); @@ -273,7 +274,7 @@ is_foreign_expr(PlannerInfo *root, * be able to make this choice with more granularity. (We check this last * because it requires a lot of expensive catalog lookups.) */ - if (contain_mutable_functions((Node *) expr)) + if (contain_unsafe_functions((Node *) expr)) return false; /* OK to evaluate on the remote server */ @@ -618,6 +619,30 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_SQLValueFunction: + { +SQLValueFunction *s = (SQLValueFunction *) node; + +/* + * For now only time-related SQLValue functions are supported. + * We can push down localtimestamp and localtime as we + * compute them locally. + */ +if ((s->op != SVFOP_CURRENT_TIMESTAMP) && + (s->op != SVFOP_CURRENT_TIMESTAMP_N) && + (s->op != SVFOP_CURRENT_TIME) && + (s->op != SVFOP_CURRENT_TIME_N) && + (s->op != SVFOP_LOCALTIMESTAMP) && + (s->op != SVFOP_LOCALTIMESTAMP_N) && + (s->op != SVFOP_LOCALTIME) && + (s->op != SVFOP_LOCALTIME_N)) + return false; + +/* Timestamp or time are not collatable */ +collation = InvalidOid; +state = FDW_COLLATE_NONE; + } + break; case T_BoolExpr: { BoolExpr *b = (BoolExpr *) node; @@ -1031,6 +1056,21 @@ is_foreign_param(PlannerInfo *root, case T_Param:
Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?
Hi. I'm currently looking on pushing down SQLValue expressions to foreign servers and was surprised that two timestamptz-related functions are not immutable. I see that this was changed in commit commit 1ab415596d1de61561d0de8fe9da4aea207adca4 Author: Tom Lane Date: Mon Oct 4 22:13:14 2004 + Correct the volatility labeling of ten timestamp-related functions, per discussion from Friday. initdb not forced in this commit but I intend to do that later. I'm not sure, why timestamptz_pl_interval and timestamptz_mi_interval are not immutable. Even if we change timezone during transaction, addition of the same interval to the same timestamps with time zone gives the same result. postgres=# begin ; BEGIN postgres=*# select current_timestamp; current_timestamp --- 2021-08-16 13:26:59.366452+03 (1 row) postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03'; timestamptz --- 2021-08-16 13:26:59.366452+03 (1 row) postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 days'::interval; ?column? --- 2021-08-18 13:26:59.366452+03 (1 row) postgres=*# set timezone to UTC; SET postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 days'::interval; ?column? --- 2021-08-18 10:26:59.366452+00 (1 row) postgres=*# select timestamptz '2021-08-18 13:26:59.366452+03' = timestamptz '2021-08-18 10:26:59.366452+00'; ?column? -- t (1 row) What do I miss? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Case expression pushdown
Tom Lane писал 2021-07-29 23:54: Alexander Pyhalov writes: [ 0001-Allow-pushing-CASE-expression-to-foreign-server-v7.patch ] I looked this over. It's better than before, but the collation handling is still not at all correct. We have to consider that a CASE's arg expression supplies the collation for a contained CaseTestExpr, otherwise we'll come to the wrong conclusions about whether "CASE foreignvar WHEN ..." is shippable, if the foreignvar is what's determining collation of the comparisons. This means that the CaseExpr level of recursion has to pass data down to the CaseTestExpr level. In the attached, I did that by adding an additional argument to foreign_expr_walker(). That's a bit invasive, but it's not awful. I thought about instead adding fields to the foreign_loc_cxt struct. But that seemed considerably messier in the end, because we'd then have some fields that are information sourced at one recursion level and some that are info sourced at another level. I also whacked the regression test cases around a lot. They seemed to spend a lot of time on irrelevant combinations, while failing to check the things that matter, namely whether collation-based pushdown decisions are made correctly. regards, tom lane Hi. Overall looks good. The only thing I'm confused about is in T_CaseTestExpr case - how can it be that CaseTestExpr collation doesn't match case_arg_cxt->collation ? Do we we need to inspect only case_arg_cxt->state? Can we assert that collation == case_arg_cxt->collation? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Case expression pushdown
Tom Lane писал 2021-07-26 18:18: Alexander Pyhalov writes: [ 0001-Allow-pushing-CASE-expression-to-foreign-server-v6.patch ] This doesn't compile cleanly: deparse.c: In function 'foreign_expr_walker.isra.4': deparse.c:920:8: warning: 'collation' may be used uninitialized in this function [-Wmaybe-uninitialized] if (collation != outer_cxt->collation) ^ deparse.c:914:3: warning: 'state' may be used uninitialized in this function [-Wmaybe-uninitialized] switch (state) ^~ These uninitialized variables very likely explain the fact that it fails regression tests, both for me and for the cfbot. Even if this weren't an outright bug, we don't tolerate code that produces warnings on common compilers. regards, tom lane Hi. Of course, this is a patch issue. Don't understand how I overlooked this. Rebased on master and fixed it. Tests are passing here (but they also passed for previous patch version). What exact tests are failing? -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 9c9fa2e37fc62ddcd8dc6176306d74b7e219fd26 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 22 Jul 2021 11:42:16 +0300 Subject: [PATCH] Allow pushing CASE expression to foreign server --- contrib/postgres_fdw/deparse.c| 150 ++ .../postgres_fdw/expected/postgres_fdw.out| 184 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 63 ++ 3 files changed, 397 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c6..df1aaf8e713 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); + /* * Helper functions */ @@ -627,6 +629,105 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_NONE; } break; + case T_CaseExpr: + { +ListCell *lc; +foreign_loc_cxt tmp_cxt; +CaseExpr *ce = (CaseExpr *) node; + +/* + * case arg expression collation doesn't affect result + * collation + */ +tmp_cxt.collation = InvalidOid; +tmp_cxt.state = FDW_COLLATE_NONE; +if (ce->arg && !foreign_expr_walker((Node *) ce->arg, glob_cxt, _cxt)) + return false; + +/* Recurse to case clause subexpressions. */ +foreach(lc, ce->args) +{ + CaseWhen *cw = lfirst_node(CaseWhen, lc); + Node *whenExpr = (Node *) cw->expr; + + /* + * The parser should have produced WHEN clauses of the + * form "CaseTestExpr = RHS", possibly with an implicit + * coercion inserted above the CaseTestExpr. However in an + * expression that's been through the optimizer, the WHEN + * clause could be almost anything (since the equality + * operator could have been expanded into an inline + * function). In this case forbid pushdown. + */ + + if (ce->arg) + { + List *whenExprArgs; + + if (!IsA(whenExpr, OpExpr)) + return false; + + whenExprArgs = ((OpExpr *) whenExpr)->args; + + if ((list_length(whenExprArgs) != 2) || + !IsA(strip_implicit_coercions(linitial(whenExprArgs)), CaseTestExpr)) + return false; + } + + /* + * case when expression collation doesn't affect result + * collation + */ + tmp_cxt.collation = InvalidOid; + tmp_cxt.state = FDW_COLLATE_NONE; + /* Recurse to case clause expression. */ + if (!foreign_expr_walker((Node *) cw->expr, + glob_cxt, _cxt)) + return false; + + /* Recurse to result expression. */ + if (!foreign_expr_walker((Node *) cw->result, + glob_cxt, _cxt)) + return false; +} + +if (!foreign_expr_walker((Node *) ce->defresult, glob_cxt, _cxt)) + return false; + +/* + * Collation rule is same as for function nodes. + */ +collation = ce->casecollid; +if (collation == InvalidOid) + state = FDW_COLLATE_NONE; +else if (inner_cxt.state == FDW_COLLATE_SAFE && + collation == inner_cxt.collation) + state = FDW_COLLATE_SAFE; +else if (collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; +else + state = FDW_COLLATE_UNSAFE; + } + break; + case T_CaseTestExpr: + { +CaseTestExpr *c = (CaseTestExpr *) node; + +/* + * Collation rule is same as for function nodes. + */ +collation = c->collation; +if (collation == InvalidOid) + state = FDW_COLLATE_NONE; +else if (inner_cxt.state == FDW_COLLATE_SAFE && + collation == inner_cxt.collation) + state = FDW_COLLATE_SAFE; +else if (collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; +else
Re: Case expression pushdown
Tom Lane писал 2021-07-21 19:49: Gilles Darold writes: I'm attaching the v5 patch again as it doesn't appears in the Latest attachment list in the commitfest. So this has a few issues: Hi. 1. In foreign_expr_walker, you're failing to recurse to either the "arg" or "defresult" subtrees of a CaseExpr, so that it would fail to notice unshippable constructs within those. Fixed this. 2. You're also failing to guard against the hazard that a WHEN expression within a CASE-with-arg has been expanded into something that doesn't look like "CaseTestExpr = something". As written, this patch would likely dump core in that situation, and if it didn't it would send nonsense to the remote server. Take a look at the check for that situation in ruleutils.c (starting at line 8764 as of HEAD) and adapt it to this. Probably what you want is to just deem the CASE un-pushable if it's been modified away from that structure. This is enough of a corner case that optimizing it isn't worth a great deal of trouble ... but crashing is not ok. I think I fixed this by copying check from ruleutils.c. 3. A potentially uncomfortable issue for the CASE-with-arg syntax is that the specific equality operator being used appears nowhere in the decompiled expression, thus raising the question of whether the remote server will interpret it the same way we did. Given that we restrict the values-to-be-compared to be of shippable types, maybe this is safe in practice, but I have a bad feeling about it. I wonder if we'd be better off just refusing to ship CASE-with-arg at all, which would a-fortiori avoid point 2. I'm not shure how 'case a when b ...' is different from 'case when a=b ...' in this case. If type of a or b is not shippable, we will not push down this expression in any way. And if they are of builtin types, why do these expressions differ? 4. I'm not sure that I believe any part of the collation handling. There is the question of what collations will be used for the individual WHEN comparisons, which can probably be left for the recursive checks of the CaseWhen.expr subtrees to handle; and then there is the separate issue of whether the CASE's result collation (which arises from the CaseWhen.result exprs plus the CaseExpr.defresult expr) can be deemed to be safely derived from remote Vars. I haven't totally thought through how that should work, but I'm pretty certain that handling the CaseWhen's within separate recursive invocations of foreign_expr_walker cannot possibly get it right. However, you'll likely have to flatten those anyway (i.e., handle them within the loop in the CaseExpr case) while fixing point 2. I've tried to account for a fact that we are interested only in caseWhen->result collations, but still not sure that I'm right here. 5. This is a cosmetic point, but: the locations of the various additions in deparse.c seem to have been chosen with the aid of a dartboard. We do have a convention for this sort of thing, which is to lay out code concerned with different node types in the same order that the node types are declared in *nodes.h. I'm not sufficiently anal to want to fix the existing violations of that rule that I see in deparse.c; but the fact that somebody got this wrong before isn't license to make things worse. regards, tom lane Fixed this. Thanks for review. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f323ce6e6e12004ee448bbd6721c396826bd9eeb Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 22 Jul 2021 11:42:16 +0300 Subject: [PATCH] Allow pushing CASE expression to foreign server --- contrib/postgres_fdw/deparse.c| 136 + .../postgres_fdw/expected/postgres_fdw.out| 184 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 63 ++ 3 files changed, 383 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c6..f0cbd958890 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); + /* * Helper functions */ @@ -627,6 +629,91 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_NONE; } break; + case T_CaseExpr: + { +ListCell *lc; +foreign_loc_cxt tmp_cxt; +CaseExpr *ce = (CaseExpr *) node; + +/* + * case arg expression collation doesn't affect result + * collation + */ +tmp_cxt.collation = InvalidOid; +tmp_cxt.state = FDW_COLLATE_NONE; +if (ce->arg && !foreign_expr_walker((Node *) ce->arg, glob_cxt, _cxt)) + return false; + +/* Recurse to case clause subexpressions. */ +foreach(lc, ce-&
Re: Case expression pushdown
Hi. Gilles Darold писал 2021-07-07 15:02: Le 22/06/2021 à 15:39, Alexander Pyhalov a écrit : Seino Yuki писал 2021-06-22 16:03: On 2021-06-16 01:29, Alexander Pyhalov wrote: Hi. Ashutosh Bapat писал 2021-06-15 16:24: Looks quite useful to me. Can you please add this to the next commitfest? Addded to commitfest. Here is an updated patch version. Thanks for posting the patch. I agree with this content. + Foreign Scan on public.ft2 (cost=156.58..165.45 rows=394 width=14) It's not a big issue, but is there any intention behind the pattern of outputting costs in regression tests? Hi. No, I don't think it makes much sense. Updated tests (also added case with empty else). The patch doesn't apply anymore to master, I join an update of your patch update in attachment. This is your patch rebased and untouched minus a comment in the test and renamed to v4. I could have miss something but I don't think that additional struct elements case_args in structs foreign_loc_cxt and deparse_expr_cxt are necessary. They look to be useless. I thought we should compare arg collation and expression collation and didn't suggest, that we can take CaseTestExpr's collation directly, without deriving it from CaseExpr's arg. Your version of course looks saner. The patch will also be more clear if the CaseWhen node was handled separately in foreign_expr_walker() instead of being handled in the T_CaseExpr case. By this way the T_CaseExpr case just need to call recursively foreign_expr_walker(). I also think that code in T_CaseTestExpr should just check the collation, there is nothing more to do here like you have commented the function deparseCaseTestExpr(). This function can be removed as it does nothing if the case_args elements are removed. There is a problem the regression test with nested CASE clauses: EXPLAIN (VERBOSE, COSTS OFF) SELECT c1,c2,c3 FROM ft2 WHERE CASE CASE WHEN c2 > 0 THEN c2 END WHEN 100 THEN 601 WHEN c2 THEN c2 ELSE 0 END > 600 ORDER BY c1; the original query use "WHERE CASE CASE WHEN" but the remote query is not the same in the plan: Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" WHERE (((CASE WHEN ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = 100) THEN 601 WHEN ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = c2) THEN c2 ELSE 0 END) > 600)) ORDER BY "C 1" ASC NULLS LAST Here this is "WHERE (((CASE WHEN ((CASE WHEN" I expected it to be unchanged to "WHERE (((CASE (CASE WHEN". I'm not sure this is an issue (as we change CASE A WHEN B ... to CASE WHEN (A=B)), and expressions should be free from side effects, but again your version looks better. Thanks for improving the patch, it looks saner now. -- Best regards, Alexander Pyhalov, Postgres Professional
PostgreSQL 14 backend crash on incorrect trigger
(plan=0x563f05fc8100, sourceText=0x563f05ad74a0 "DELETE FROM test WHERE i=1;", params=0x0, queryEnv=0x0, dest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at pquery.c:190 #29 0x563f037478fd in PortalRunMulti (portal=0x563f05b3aa50, isTopLevel=true, setHoldSnapshot=false, dest=0x563f05fc81f0, altdest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at pquery.c:1266 #30 0x563f03746e24 in PortalRun (portal=0x563f05b3aa50, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x563f05fc81f0, altdest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at pquery.c:786 #31 0x563f03740084 in exec_simple_query (query_string=0x563f05ad74a0 "DELETE FROM test WHERE i=1;") at postgres.c:1214 #32 0x563f03744c41 in PostgresMain (argc=1, argv=0x7ffcab5c6f10, dbname=0x563f05b02938 "contrib_regression", username=0x563f05b02918 "leoric") at postgres.c:4486 #33 0x563f03670f3a in BackendRun (port=0x563f05af8f00) at postmaster.c:4507 #34 0x563f036707f3 in BackendStartup (port=0x563f05af8f00) at postmaster.c:4229 #35 0x563f0366c97c in ServerLoop () at postmaster.c:1745 #36 0x563f0366c115 in PostmasterMain (argc=8, argv=0x563f05ad1820) at postmaster.c:1417 #37 0x563f0356193d in main (argc=8, argv=0x563f05ad1820) at main.c:209 (gdb) print *subplan $2 = {type = T_Result, startup_cost = 0, total_cost = 0, plan_rows = 0, plan_width = 0, parallel_aware = false, parallel_safe = false, async_capable = false, plan_node_id = 0, targetlist = 0x563f06020d40, qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0} -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 911f171d812..7d5051406a2 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -3355,3 +3355,31 @@ CREATE FOREIGN TABLE inv_fsz (c1 int ) -- Invalid batch_size option CREATE FOREIGN TABLE inv_bsz (c1 int ) SERVER loopback OPTIONS (batch_size '100$%$#$#'); + +CREATE TABLE test (i int, j int); +CREATE TABLE test_base (i int, j int); +INSERT INTO test VALUES (1,2),(3,4); +INSERT INTO test_base VALUES (1,2),(3,4); +CREATE FOREIGN TABLE test_remote (i int, j int) + SERVER loopback OPTIONS (table_name 'test_base'); +CREATE FUNCTION update_test() +RETURNS trigger + LANGUAGE plpgsql +AS $function$ +DECLARE + rec record; + tname text; +BEGIN + IF (TG_OP = 'DELETE') THEN + -- reference to new instead of old +DELETE FROM test_remote WHERE row(i,j)=row(new.i,new.j); + ELSIF (TG_OP = 'INSERT') THEN +INSERT INTO test_remote VALUES(new.i, new.j); + ELSIF (TG_OP = 'UPDATE') THEN +UPDATE test_remote SET i=new.i, j=new.i WHERE i=old.i AND j=old.j; + END IF; + RETURN NULL; + END; +$function$; +CREATE TRIGGER tg_upd AFTER INSERT OR DELETE OR UPDATE ON public.test FOR EACH ROW EXECUTE FUNCTION update_test(); +DELETE FROM test WHERE i=1;
Re: Asymmetric partition-wise JOIN
Andrey Lepikhov писал 2021-07-06 12:28: On 5/7/21 23:15, Zhihong Yu wrote: On Mon, Jul 5, 2021 at 2:57 AM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: + * Can't imagine situation when join relation already exists. But in + * the 'partition_join' regression test it happens. + * It may be an indicator of possible problems. Should a log be added in the above case ? I made additional analysis of this branch of code. This situation can happen in the case of one child or if we join two plane tables with partitioned. Both situations are legal and I think we don't needed to add any log message here. Other mistakes were fixed. Hi. Small typo in comment in src/backend/optimizer/plan/setrefs.c: 281 282 /* 283 * Adjust RT indexes of AppendRelInfos and add to final appendrels list. 284 * The AppendRelInfos are copied, because as a part of a subplan its could 285 * be visited many times in the case of asymmetric join. 286 */ 287 foreach(lc, root->append_rel_list) 288 { its -> it (or they) ? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partitioned index can be not dumped
Álvaro Herrera писал 2021-06-30 21:54: On 2021-Jun-30, Alexander Pyhalov wrote: I've seen the following effect on PostgreSQL 14 stable branch. Index, created on partitioned table, disappears from pg_dump or psql \d output. This seems to begin after analyze. Partitoned relation relhasindex pg_class field suddenly becomes false. Yeah, that seems correct. I didn't verify your test case, but after looking at the code I thought there was a bit too much churn and the new conditions looked quite messy and unexplained. It seems simpler to be explicit at the start about what we're doing, and keep nindexes=0 for partitioned tables; with that, the code works unchanged because the "for" loops do nothing without having to check for anything. My proposal is attached. I did run the tests and they do pass, but I didn't look very closely at what the tests are actually doing. I noticed that part of that comment seems to be a leftover from ... I don't know when: "We do not analyze index columns if there was an explicit column list in the ANALYZE command, however." I suppose this is about some code that was removed, but I didn't dig into it. Looks good. It seems this comment refers to line 455. 445 if (nindexes > 0) 446 { 447 indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData)); 448 for (ind = 0; ind < nindexes; ind++) 449 { 450 AnlIndexData *thisdata = [ind]; 451 IndexInfo *indexInfo; 452 453 thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]); 454 thisdata->tupleFract = 1.0; /* fix later if partial */ 455 if (indexInfo->ii_Expressions != NIL && va_cols == NIL) 456 { 457 ListCell *indexpr_item = list_head(indexInfo->ii_Expressions); 458 459 thisdata->vacattrstats = (VacAttrStats **) 460 palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *)); Also I've added non-necessary new line in test. Restored comment and removed new line. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 0677fca372349ab2a1f5bc2e69a91c72ae7dfaa3 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Wed, 30 Jun 2021 22:24:56 +0300 Subject: [PATCH] Set relhasindex for partitioned tables correctly. The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe: in this commit we unconditionally set nindexes to 0 for partitioned relations. --- src/backend/commands/analyze.c | 24 +--- src/test/regress/expected/vacuum.out | 17 + src/test/regress/sql/vacuum.sql | 11 +++ 3 files changed, 45 insertions(+), 7 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 426c1e67109..c21cb7da3cf 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -422,18 +422,28 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * columns in the indexes. We do not analyze index columns if there was * an explicit column list in the ANALYZE command, however. If we are * doing a recursive scan, we don't want to touch the parent's indexes at - * all. + * all. If we're processing a partitioned table, we need to know if there + * are any indexes, but we don't want to process them. */ - if (!inh) + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { vac_open_indexes(onerel, AccessShareLock, , ); + hasindex = nindexes > 0; + if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + vac_close_indexes(nindexes, Irel, NoLock); + nindexes = 0; + Irel = NULL; + } + } else { Irel = NULL; nindexes = 0; + hasindex = false; } - hasindex = (nindexes > 0); indexdata = NULL; - if (hasindex) + if (nindexes > 0) { indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData)); for (ind = 0; ind < nindexes; ind++) @@ -572,7 +582,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, MemoryContextResetAndDeleteChildren(col_context); } - if (hasindex) + if (nindexes > 0) compute_index_stats(onerel, totalrows, indexdata, nindexes, rows, numrows, @@ -660,10 +670,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params, /* * Partitioned tables don't have storage, so we don't set any fields * in their pg_class entries except for reltuples, which is necessary - * for auto-analyze to work properly. + * for auto-analyze to work properly, and relhasindex. */ vac_update_relstats(onerel, -1, totalrows, - 0, false, InvalidTransactionId, + 0, hasindex, InvalidTransactionId, InvalidMultiXactId, in_outer_xact); } diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out index e5771462d57..e3d462b66fa 100644 --- a/src/test/regress/expected/vacuum.out
Re: Partitioned index can be not dumped
Alexander Pyhalov писал 2021-06-30 17:26: Hi. Sorry, test had an issue. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 2aabf5e8e86d222e6a73b25ccc652fe645e12fc4 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Wed, 30 Jun 2021 17:22:37 +0300 Subject: [PATCH] Set relhasindex for partitioned tables correctly. The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe: in this commit we unconditionally set nindexes to 0 for partitioned relations. --- src/backend/commands/analyze.c | 53 src/test/regress/expected/vacuum.out | 17 + src/test/regress/sql/vacuum.sql | 12 +++ 3 files changed, 59 insertions(+), 23 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 426c1e67109..9c8913a1619 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -424,8 +424,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * doing a recursive scan, we don't want to touch the parent's indexes at * all. */ - if (!inh) + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { vac_open_indexes(onerel, AccessShareLock, , ); + } else { Irel = NULL; @@ -433,7 +435,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, } hasindex = (nindexes > 0); indexdata = NULL; - if (hasindex) + if (hasindex && !inh) { indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData)); for (ind = 0; ind < nindexes; ind++) @@ -487,14 +489,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params, if (targrows < vacattrstats[i]->minrows) targrows = vacattrstats[i]->minrows; } - for (ind = 0; ind < nindexes; ind++) + if (!inh) { - AnlIndexData *thisdata = [ind]; - - for (i = 0; i < thisdata->attr_cnt; i++) + for (ind = 0; ind < nindexes; ind++) { - if (targrows < thisdata->vacattrstats[i]->minrows) -targrows = thisdata->vacattrstats[i]->minrows; + AnlIndexData *thisdata = [ind]; + + for (i = 0; i < thisdata->attr_cnt; i++) + { +if (targrows < thisdata->vacattrstats[i]->minrows) + targrows = thisdata->vacattrstats[i]->minrows; + } } } @@ -572,7 +577,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, MemoryContextResetAndDeleteChildren(col_context); } - if (hasindex) + if (hasindex & !inh) compute_index_stats(onerel, totalrows, indexdata, nindexes, rows, numrows, @@ -589,23 +594,25 @@ do_analyze_rel(Relation onerel, VacuumParams *params, update_attstats(RelationGetRelid(onerel), inh, attr_cnt, vacattrstats); - for (ind = 0; ind < nindexes; ind++) + if (!inh) { - AnlIndexData *thisdata = [ind]; + for (ind = 0; ind < nindexes; ind++) + { +AnlIndexData *thisdata = [ind]; - update_attstats(RelationGetRelid(Irel[ind]), false, - thisdata->attr_cnt, thisdata->vacattrstats); - } +update_attstats(RelationGetRelid(Irel[ind]), false, +thisdata->attr_cnt, thisdata->vacattrstats); + } - /* - * Build extended statistics (if there are any). - * - * For now we only build extended statistics on individual relations, - * not for relations representing inheritance trees. - */ - if (!inh) + /* + * Build extended statistics (if there are any). + * + * For now we only build extended statistics on individual + * relations, not for relations representing inheritance trees. + */ BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats); + } } pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, @@ -663,7 +670,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * for auto-analyze to work properly. */ vac_update_relstats(onerel, -1, totalrows, - 0, false, InvalidTransactionId, + 0, hasindex, InvalidTransactionId, InvalidMultiXactId, in_outer_xact); } @@ -704,7 +711,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * amvacuumcleanup() when called in ANALYZE-only mode. The only exception * among core index AMs is GIN/ginvacuumcleanup(). */ - if (!(params->options & VACOPT_VACUUM)) + if (!(params->options & VACOPT_VACUUM) && !inh) { for (ind = 0; ind < nindexes; ind++) { diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out index e5771462d57..e3d462b66fa 100644 --- a/src/test/regress/expected/vacuum.out +++ b/src/test/regress/expected/vacuum.out @@ -199,6 +199,23 @@ VACUUM ANALYZE vacparted(a,b,a); ERROR: column "a" of relation "vacparted" appears more than once ANALYZE vacparted(a,b,b); ERROR: column "b" of relation "vacparted" appears more than once +-- partitioned table with index +CREATE TABLE vacparted_i (a int primary key, b varchar(100)) PARTITION BY HAS
Partitioned index can be not dumped
Hi. I've seen the following effect on PostgreSQL 14 stable branch. Index, created on partitioned table, disappears from pg_dump or psql \d output. This seems to begin after analyze. Partitoned relation relhasindex pg_class field suddenly becomes false. The issue happens after commit 0e69f705cc1a3df273b38c9883fb5765991e04fe (HEAD, refs/bisect/bad) Author: Alvaro Herrera Date: Fri Apr 9 11:29:08 2021 -0400 Set pg_class.reltuples for partitioned tables When commit 0827e8af70f4 added auto-analyze support for partitioned tables, it included code to obtain reltuples for the partitioned table as a number of catalog accesses to read pg_class.reltuples for each partition. That's not only very inefficient, but also problematic because autovacuum doesn't hold any locks on any of those tables -- and doesn't want to. Replace that code with a read of pg_class.reltuples for the partitioned table, and make sure ANALYZE and TRUNCATE properly maintain that value. I found no code that would be affected by the change of relpages from zero to non-zero for partitioned tables, and no other code that should be maintaining it, but if there is, hopefully it'll be an easy fix. Per buildfarm. Author: Álvaro Herrera Reviewed-by: Zhihong Yu It seems that in this commit we unconditionally overwrite this data with 0. I've tried to fix it by getting this information when inh is true and ignoring nindexes when inh is not true. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 0e66025a32c6e848b2b77355631b06b4b8d4dd08 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Wed, 30 Jun 2021 17:22:37 +0300 Subject: [PATCH] Set relhasindex for partitioned tables correctly. The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe: in this commit we unconditionally set nindexes to 0 for partitioned relations. --- src/backend/commands/analyze.c | 53 src/test/regress/expected/vacuum.out | 18 ++ src/test/regress/sql/vacuum.sql | 12 +++ 3 files changed, 60 insertions(+), 23 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 426c1e67109..9c8913a1619 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -424,8 +424,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * doing a recursive scan, we don't want to touch the parent's indexes at * all. */ - if (!inh) + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { vac_open_indexes(onerel, AccessShareLock, , ); + } else { Irel = NULL; @@ -433,7 +435,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, } hasindex = (nindexes > 0); indexdata = NULL; - if (hasindex) + if (hasindex && !inh) { indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData)); for (ind = 0; ind < nindexes; ind++) @@ -487,14 +489,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params, if (targrows < vacattrstats[i]->minrows) targrows = vacattrstats[i]->minrows; } - for (ind = 0; ind < nindexes; ind++) + if (!inh) { - AnlIndexData *thisdata = [ind]; - - for (i = 0; i < thisdata->attr_cnt; i++) + for (ind = 0; ind < nindexes; ind++) { - if (targrows < thisdata->vacattrstats[i]->minrows) -targrows = thisdata->vacattrstats[i]->minrows; + AnlIndexData *thisdata = [ind]; + + for (i = 0; i < thisdata->attr_cnt; i++) + { +if (targrows < thisdata->vacattrstats[i]->minrows) + targrows = thisdata->vacattrstats[i]->minrows; + } } } @@ -572,7 +577,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, MemoryContextResetAndDeleteChildren(col_context); } - if (hasindex) + if (hasindex & !inh) compute_index_stats(onerel, totalrows, indexdata, nindexes, rows, numrows, @@ -589,23 +594,25 @@ do_analyze_rel(Relation onerel, VacuumParams *params, update_attstats(RelationGetRelid(onerel), inh, attr_cnt, vacattrstats); - for (ind = 0; ind < nindexes; ind++) + if (!inh) { - AnlIndexData *thisdata = [ind]; + for (ind = 0; ind < nindexes; ind++) + { +AnlIndexData *thisdata = [ind]; - update_attstats(RelationGetRelid(Irel[ind]), false, - thisdata->attr_cnt, thisdata->vacattrstats); - } +update_attstats(RelationGetRelid(Irel[ind]), false, +thisdata->attr_cnt, thisdata->vacattrstats); + } - /* - * Build extended statistics (if there are any). - * - * For now we only build extended statistics on individual relations, - * not for relations representing inheritance trees. - */ - if (!inh) + /* + * Build extended statistics (if there are any). + * + * For now we only build extended statistics on individual + * relations, not for relations representing inheritance trees. + */
Re: Case expression pushdown
Seino Yuki писал 2021-06-22 16:03: On 2021-06-16 01:29, Alexander Pyhalov wrote: Hi. Ashutosh Bapat писал 2021-06-15 16:24: Looks quite useful to me. Can you please add this to the next commitfest? Addded to commitfest. Here is an updated patch version. Thanks for posting the patch. I agree with this content. + Foreign Scan on public.ft2 (cost=156.58..165.45 rows=394 width=14) It's not a big issue, but is there any intention behind the pattern of outputting costs in regression tests? Hi. No, I don't think it makes much sense. Updated tests (also added case with empty else). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f6b57a3b84d1be0325321d4a0971f7b05b13a80a Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Tue, 30 Mar 2021 13:24:14 +0300 Subject: [PATCH] Allow pushing CASE expression to foreign server --- contrib/postgres_fdw/deparse.c| 118 ++ .../postgres_fdw/expected/postgres_fdw.out| 64 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 28 + 3 files changed, 210 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c6..3621fed4b54 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt { Oid collation; /* OID of current collation, if any */ FDWCollateState state; /* state of current collation choice */ + Expr *case_arg; /* the last case arg to inspect */ } foreign_loc_cxt; /* @@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt * a base relation. */ StringInfo buf; /* output buffer to append to */ List **params_list; /* exprs that will become remote Params */ + List *case_args; /* list of args to deparse CaseTestExpr */ } deparse_expr_cxt; #define REL_ALIAS_PREFIX "r" @@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); +static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context); + /* * Helper functions */ @@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root, glob_cxt.relids = baserel->relids; loc_cxt.collation = InvalidOid; loc_cxt.state = FDW_COLLATE_NONE; + loc_cxt.case_arg = NULL; if (!foreign_expr_walker((Node *) expr, _cxt, _cxt)) return false; @@ -312,6 +318,7 @@ foreign_expr_walker(Node *node, /* Set up inner_cxt for possible recursion to child nodes */ inner_cxt.collation = InvalidOid; inner_cxt.state = FDW_COLLATE_NONE; + inner_cxt.case_arg = outer_cxt->case_arg; switch (nodeTag(node)) { @@ -509,6 +516,62 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_CaseExpr: + { +CaseExpr *ce = (CaseExpr *) node; +ListCell *arg; + +if (ce->arg) + inner_cxt.case_arg = ce->arg; + +foreach(arg, ce->args) +{ + CaseWhen *w = lfirst_node(CaseWhen, arg); + + if (!foreign_expr_walker((Node *) w->expr, + glob_cxt, _cxt)) + return false; + + if (!foreign_expr_walker((Node *) w->result, + glob_cxt, _cxt)) + return false; +} + +if (!foreign_expr_walker((Node *) ce->defresult, + glob_cxt, _cxt)) + return false; + +collation = ce->casecollid; +if (collation == InvalidOid) + state = FDW_COLLATE_NONE; +else if (inner_cxt.state == FDW_COLLATE_SAFE && + collation == inner_cxt.collation) + state = FDW_COLLATE_SAFE; +else if (collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; +else + state = FDW_COLLATE_UNSAFE; + } + break; + case T_CaseTestExpr: + { +Expr *arg; + +Assert(outer_cxt->case_arg != NULL); +arg = outer_cxt->case_arg; + +if (!foreign_expr_walker((Node *) arg, + glob_cxt, _cxt)) + return false; + +/* + * Collation and state just bubble up from the previously + * saved case argument + */ +collation = inner_cxt.collation; +state = inner_cxt.state; + } + break; case T_OpExpr: case T_DistinctExpr: /* struct-equivalent to OpExpr */ { @@ -1019,6 +1082,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel, context.foreignrel = rel; context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel; context.params_list = params_list; + context.case_args = NIL; /* Construct SELECT clause */ deparseSelectSql(tlist, is_subquery, retrieved_attrs, ); @@ -1598,6 +1662,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, context.scanrel = foreignrel; context.root = root; context.params_list = params_list; + context.case_args = NIL; appendStringInfoChar(buf, '('); appen
Re: Asymmetric partition-wise JOIN
n, symmetric or asymmetric 1701 * partition-wise join. It is not correct right now, however, a hook 1702 * on add_path() to give additional decision for path removal allows 1703 * to retain this kind of AppendPath, regardless of its cost. 1704 */ 1705 if (IsA(append_path, AppendPath)) What hook do you refer to? src/backend/optimizer/plan/setrefs.c: 282 /* 283 * Adjust RT indexes of AppendRelInfos and add to final appendrels list. 284 * We assume the AppendRelInfos were built during planning and don't need 285 * to be copied. 286 */ 287 foreach(lc, root->append_rel_list) 288 { 289 AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc); 290 AppendRelInfo *newappinfo; 291 292 /* flat copy is enough since all valuable fields are scalars */ 293 newappinfo = (AppendRelInfo *) palloc(sizeof(AppendRelInfo)); 294 memcpy(newappinfo, appinfo, sizeof(AppendRelInfo)); You've changed function to copy appinfo, so now comment is incorrect. src/backend/optimizer/util/appendinfo.c: 588 /* Construct relids set for the immediate parent of the given child. */ 589 normal_relids = bms_copy(child_relids); 590 for (cnt = 0; cnt < nappinfos; cnt++) 591 { 592 AppendRelInfo *appinfo = appinfos[cnt]; 593 594 parent_relids = bms_add_member(parent_relids, appinfo->parent_relid); 595 normal_relids = bms_del_member(normal_relids, appinfo->child_relid); 596 } 597 parent_relids = bms_union(parent_relids, normal_relids); Do I understand correctly that now parent_relids also contains relids of relations from 'global' inner relation, which we join to childs? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: postgres_fdw batching vs. (re)creating the tuple slots
Tomas Vondra писал 2021-06-12 00:01: On 6/9/21 1:08 PM, Tomas Vondra wrote: On 6/9/21 12:50 PM, Bharath Rupireddy wrote: On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra wrote: Hi, Here's a v2 fixing a silly bug with reusing the same variable in two nested loops (worked for simple postgres_fdw cases, but "make check" failed). I applied these patches and ran make check in postgres_fdw contrib module, I saw a server crash. Is it the same failure you were saying above? Nope, that was causing infinite loop. This is jut a silly mistake on my side - I forgot to replace the i/j variable inside the loop. Here's v3. regards FWIW I've pushed this, after improving the comments a little bit. regards Hi. It seems this commit commit b676ac443b6a83558d4701b2dd9491c0b37e17c4 Author: Tomas Vondra Date: Fri Jun 11 20:19:48 2021 +0200 Optimize creation of slots for FDW bulk inserts has broken batch insert for partitions with unique indexes. Earlier the case worked as expected, inserting 1000 tuples. Now it exits with ERROR: duplicate key value violates unique constraint "p0_pkey" DETAIL: Key (x)=(1) already exists. CONTEXT: remote SQL command: INSERT INTO public.batch_table_p0(x, field1, field2) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9), ($10, $11, $12), ($13, $14, $15), ($16, $17, $18), ($19, $20, $21), ($22, $23, $24), ($25, $26, $27), ($28, $29, $30), ($31, $32, $33), ($34, $35, $36), ($37, $38, $39), ($40, $41, $42), ($43, $44, $45), ($46, $47, $48), ($49, $50, $51), ($52, $53, $54), ($55, $56, $57), ($58, $59, $60), ($61, $62, $63), ($64, $65, $66), ($67, $68, $69), ($70, $71, $72), ($73, $74, $75), ($76, $77, $78), ($79, $80, $81), ($82, $83, $84), ($85, $86, $87), ($88, $89, $90), ($91, $92, $93), ($94, $95, $96), ($97, $98, $99), ($100, $101, $102), ($103, $104, $105), ($106, $107, $108), ($109, $110, $111), ($112, $113, $114), ($115, $116, $117), ($118, $119, $120), ($121, $122, $123), ($124, $125, $126), ($127, $128, $129), ($130, $131, $132), ($133, $134, $135), ($136, $137, $138), ($139, $140, $141), ($142, $143, $144), ($145, $146, $147), ($148, $149, $150), ($151, $152, $153), ($154, $155, $156), ($157, $158, $159), ($160, $161, $162), ($163, $164, $165), ($166, $167, $168), ($169, $170, $171), ($172, $173, $174), ($175, $176, $177), ($178, $179, $180), ($181, $182, $183), ($184, $185, $186), ($187, $188, $189), ($190, $191, $192), ($193, $194, $195), ($196, $197, $198), ($199, $200, $201), ($202, $203, $204), ($205, $206, $207), ($208, $209, $210), ($211, $212, $213), ($214, $215, $216), ($217, $218, $219), ($220, $221, $222), ($223, $224, $225), ($226, $227, $228), ($229, $230, $231), ($232, $233, $234), ($235, $236, $237), ($238, $239, $240), ($241, $242, $243), ($244, $245, $246), ($247, $248, $249), ($250, $251, $252), ($253, $254, $255), ($256, $257, $258), ($259, $260, $261), ($262, $263, $264), ($265, $266, $267), ($268, $269, $270), ($271, $272, $273), ($274, $275, $276), ($277, $278, $279), ($280, $281, $282), ($283, $284, $285), ($286, $287, $288), ($289, $290, $291), ($292, $293, $294), ($295, $296, $297), ($298, $299, $300) -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 8cb2148f1f6..4c280f1e777 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -3083,7 +3083,46 @@ UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a SELECT tableoid::regclass, * FROM batch_cp_upd_test; -- Clean up -DROP TABLE batch_table, batch_cp_upd_test CASCADE; +DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE; + +-- Use partitioning +ALTER SERVER loopback OPTIONS (ADD batch_size '100'); + +CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x); + +CREATE TABLE batch_table_p0 (LIKE batch_table); +ALTER TABLE batch_table_p0 ADD CONSTRAINT p0_pkey PRIMARY KEY (x); +CREATE FOREIGN TABLE batch_table_p0f + PARTITION OF batch_table + FOR VALUES WITH (MODULUS 4, REMAINDER 0) + SERVER loopback + OPTIONS (table_name 'batch_table_p0'); + +CREATE TABLE batch_table_p1 (LIKE batch_table); +ALTER TABLE batch_table_p1 ADD CONSTRAINT p1_pkey PRIMARY KEY (x); +CREATE FOREIGN TABLE batch_table_p1f + PARTITION OF batch_table + FOR VALUES WITH (MODULUS 4, REMAINDER 1) + SERVER loopback + OPTIONS (table_name 'batch_table_p1'); + +CREATE TABLE batch_table_p2 + PARTITION OF batch_table + FOR VALUES WITH (MODULUS 4, REMAINDER 2); +ALTER TABLE batch_table_p2 ADD CONSTRAINT p2_pkey PRIMARY KEY (x); + +CREATE TABLE batch_table_p3 (LIKE batch_table); +ALTER TABLE batch_table_p3 ADD CONSTRAINT p3_pkey PRIMARY KEY (x); +CREATE FOREIGN TABLE batch_table_p3f + PARTITION OF batch_table + FOR VALUES WITH (MODULUS 4, REMAINDER 3) + SERVER loopback + OPTIONS (table_name 'batch_table_p3'); + +INSERT INTO batch_t
Re: Case expression pushdown
Hi. Ashutosh Bapat писал 2021-06-15 16:24: Looks quite useful to me. Can you please add this to the next commitfest? Addded to commitfest. Here is an updated patch version. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 80d60eb9b1630ee55d1825964e0e976ae6c289a1 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Tue, 30 Mar 2021 13:24:14 +0300 Subject: [PATCH] Allow pushing CASE expression to foreign server --- contrib/postgres_fdw/deparse.c| 118 ++ .../postgres_fdw/expected/postgres_fdw.out| 47 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 22 3 files changed, 187 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c6..3621fed4b54 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt { Oid collation; /* OID of current collation, if any */ FDWCollateState state; /* state of current collation choice */ + Expr *case_arg; /* the last case arg to inspect */ } foreign_loc_cxt; /* @@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt * a base relation. */ StringInfo buf; /* output buffer to append to */ List **params_list; /* exprs that will become remote Params */ + List *case_args; /* list of args to deparse CaseTestExpr */ } deparse_expr_cxt; #define REL_ALIAS_PREFIX "r" @@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); +static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context); + /* * Helper functions */ @@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root, glob_cxt.relids = baserel->relids; loc_cxt.collation = InvalidOid; loc_cxt.state = FDW_COLLATE_NONE; + loc_cxt.case_arg = NULL; if (!foreign_expr_walker((Node *) expr, _cxt, _cxt)) return false; @@ -312,6 +318,7 @@ foreign_expr_walker(Node *node, /* Set up inner_cxt for possible recursion to child nodes */ inner_cxt.collation = InvalidOid; inner_cxt.state = FDW_COLLATE_NONE; + inner_cxt.case_arg = outer_cxt->case_arg; switch (nodeTag(node)) { @@ -509,6 +516,62 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_CaseExpr: + { +CaseExpr *ce = (CaseExpr *) node; +ListCell *arg; + +if (ce->arg) + inner_cxt.case_arg = ce->arg; + +foreach(arg, ce->args) +{ + CaseWhen *w = lfirst_node(CaseWhen, arg); + + if (!foreign_expr_walker((Node *) w->expr, + glob_cxt, _cxt)) + return false; + + if (!foreign_expr_walker((Node *) w->result, + glob_cxt, _cxt)) + return false; +} + +if (!foreign_expr_walker((Node *) ce->defresult, + glob_cxt, _cxt)) + return false; + +collation = ce->casecollid; +if (collation == InvalidOid) + state = FDW_COLLATE_NONE; +else if (inner_cxt.state == FDW_COLLATE_SAFE && + collation == inner_cxt.collation) + state = FDW_COLLATE_SAFE; +else if (collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; +else + state = FDW_COLLATE_UNSAFE; + } + break; + case T_CaseTestExpr: + { +Expr *arg; + +Assert(outer_cxt->case_arg != NULL); +arg = outer_cxt->case_arg; + +if (!foreign_expr_walker((Node *) arg, + glob_cxt, _cxt)) + return false; + +/* + * Collation and state just bubble up from the previously + * saved case argument + */ +collation = inner_cxt.collation; +state = inner_cxt.state; + } + break; case T_OpExpr: case T_DistinctExpr: /* struct-equivalent to OpExpr */ { @@ -1019,6 +1082,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel, context.foreignrel = rel; context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel; context.params_list = params_list; + context.case_args = NIL; /* Construct SELECT clause */ deparseSelectSql(tlist, is_subquery, retrieved_attrs, ); @@ -1598,6 +1662,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel, context.scanrel = foreignrel; context.root = root; context.params_list = params_list; + context.case_args = NIL; appendStringInfoChar(buf, '('); appendConditions(fpinfo->joinclauses, ); @@ -1901,6 +1966,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root, context.scanrel = foreignrel; context.buf = buf; context.params_list = params_list; + context.case_args = NIL; appendStringInfoString(buf, "UPDATE "); deparseRelation(buf, rel); @@ -2008,6 +2074,7 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root, context.scanrel = f
Case expression pushdown
Hi. This patch allows pushing case expressions to foreign servers, so that more types of updates could be executed directly. For example, without patch: EXPLAIN (VERBOSE, COSTS OFF) UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END WHERE c1 > 1000; QUERY PLAN --- Update on public.ft2 d Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2 WHERE ctid = $1 -> Foreign Scan on public.ft2 d Output: CASE WHEN (c2 > 0) THEN c2 ELSE 0 END, ctid, d.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1000)) FOR UPDATE EXPLAIN (VERBOSE, COSTS OFF) UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END WHERE c1 > 1000; QUERY PLAN Update on public.ft2 d -> Foreign Update on public.ft2 d Remote SQL: UPDATE "S 1"."T 1" SET c2 = (CASE WHEN (c2 > 0) THEN c2 ELSE 0 END) WHERE (("C 1" > 1000)) -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 19202bfa5ba8a7eadf1a3b0ce869106e967e0dd2 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Tue, 30 Mar 2021 13:24:14 +0300 Subject: [PATCH] Allow pushing CASE expression to foreign server --- contrib/postgres_fdw/deparse.c| 124 ++ .../postgres_fdw/expected/postgres_fdw.out| 28 contrib/postgres_fdw/sql/postgres_fdw.sql | 16 +++ 3 files changed, 168 insertions(+) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c6..4e8162c045c 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt { Oid collation; /* OID of current collation, if any */ FDWCollateState state; /* state of current collation choice */ + List *case_args; /* list of case args to inspect */ } foreign_loc_cxt; /* @@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt * a base relation. */ StringInfo buf; /* output buffer to append to */ List **params_list; /* exprs that will become remote Params */ + List *case_args; /* list of args to deparse CaseTestExpr */ } deparse_expr_cxt; #define REL_ALIAS_PREFIX "r" @@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); +static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context); + /* * Helper functions */ @@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root, glob_cxt.relids = baserel->relids; loc_cxt.collation = InvalidOid; loc_cxt.state = FDW_COLLATE_NONE; + loc_cxt.case_args = NIL; if (!foreign_expr_walker((Node *) expr, _cxt, _cxt)) return false; @@ -312,6 +318,7 @@ foreign_expr_walker(Node *node, /* Set up inner_cxt for possible recursion to child nodes */ inner_cxt.collation = InvalidOid; inner_cxt.state = FDW_COLLATE_NONE; + inner_cxt.case_args = outer_cxt->case_args; switch (nodeTag(node)) { @@ -509,6 +516,69 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_CaseExpr: + { +CaseExpr *ce = (CaseExpr *) node; +ListCell *arg; + +if (ce->arg) +{ + inner_cxt.case_args = lappend(inner_cxt.case_args, ce->arg); +} + +foreach(arg, ce->args) +{ + CaseWhen *w = lfirst_node(CaseWhen, arg); + + if (!foreign_expr_walker((Node *) w->expr, + glob_cxt, _cxt)) + return false; + + if (!foreign_expr_walker((Node *) w->result, + glob_cxt, _cxt)) + return false; +} + +if (!foreign_expr_walker((Node *) ce->defresult, + glob_cxt, _cxt)) + return false; + +if (ce->arg) +{ + inner_cxt.case_args = list_delete_last(inner_cxt.case_args); + outer_cxt->case_args = inner_cxt.case_args; +} + +collation = ce->casecollid; +if (collation == InvalidOid) + state = FDW_COLLATE_NONE; +else if (inner_cxt.state == FDW_COLLATE_SAFE && + collation == inner_cxt.collation) + state = FDW_COLLATE_SAFE; +else if (collation == DEFAULT_COLLATION_OID) + state = FDW_COLLATE_NONE; +else + state = FDW_COLLATE_UNSAFE; + } + break; + case T_CaseTestExpr: + { +Expr *arg; + +Assert(outer_cxt->case_args != NIL); +arg = llast(outer_cxt->case_args); + +if (!foreign_expr_walker((Node *) arg, +glob_cxt,
Re: join pushdown and issue with foreign update
Tom Lane писал 2021-06-02 00:32: I wrote: I think a preferable fix involves making sure that the correct record-type typmod is propagated to record_in in this context. Alternatively, maybe we could insert the foreign table's rowtype during execution of the input operation, without touching the plan as such. Here's a draft-quality patch based on that idea. It resolves the offered test case, but I haven't beat on it beyond that. regards, tom lane Hi. The patch seems to work fine for mentioned case. For now I'm working on function pushdown. When record-returning function (like unnest()) is pushed down, on this stage we've already lost any type information, so get the issue again. So far I'm not sure how to fix the issue, perhaps just avoid pushing foreign join if we have record, corresponding to function RTE var in joinrel->reltarget? -- Best regards, Alexander Pyhalov, Postgres Professional