from:"Alexander Pyhalov"

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-05-28 Thread Alexander Pyhalov


Ilya Gladyshev писал(а) 2024-05-28 02:52:

Also I'd like to note that in new patch version there's a strange 
wording in documentation:


"This can be very convenient as not only will all existing partitions 
be

 indexed, but any future partitions will be as well.
 CREATE INDEX ... CONCURRENTLY can incur long lock 
times

 on huge partitioned tables, to avoid that you can
 use CREATE INDEX ON ONLY the partitioned table, 
which
 creates the new index marked as invalid, preventing automatic 
application

 to existing partitions."

All the point of CIC is to avoid long lock times. So it seems this 
paragraph should be rewritten in the following way:


"To avoid long lock times, you can use CREATE INDEX CONCURRENTLY or 
CREATE INDEX ON ONLY the partitioned table..."



True, the current wording doesn't look right. Right now CREATE INDEX ON 
ONLY is described as a workaround for the missing CIC. I think it 
rather makes sense to say that it gives more fine-grained control of 
partition locking than both CIC and ordinary CREATE INDEX. See the 
updated patch.


Hi.

Not sure if it's worth removing mentioning of CIC in

  creates the new index marked as invalid, preventing automatic 
application
  to existing partitions.  Instead, indexes can then be created 
individually

- on each partition using CONCURRENTLY and
+ on each partition and
  attached to the partitioned index on the 
parent
  using ALTER INDEX ... ATTACH PARTITION.  Once 
indexes for
  all the partitions are attached to the parent index, the parent 
index will


but at least now it looks better.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2024-05-27 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30:

Hi Mr. Pyhalov.

Sorry for the late reply.
Thank you for your modification and detailed review.
I attach a fixed patch, have been not yet rebased.

Monday, 25 March 2024 16:01 Alexander Pyhalov 
:.

Comment in nodeAgg.c seems to be strange:

1079 /*
1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE
keyword is
1081 * not specified, apply the agg's finalfn.
1082 * If PARTIAL_AGGREGATE keyword is specified and the
transValue type
1083 * is internal, apply the agg's serialfn. In this case, if
the agg's
1084 * serialfn must not be invalid. Otherwise return
transValue.
1085 */

Likely, you mean:

... In this case the agg'ss serialfn must not be invalid...

Fixed.


Lower, in the same file, please, correct error message:

1136 if(!OidIsValid(peragg->serialfn_oid))
1137  elog(ERROR, "serialfunc is note provided
for partial aggregate");

it should be "serialfunc is not provided for partial aggregate"

Fixed.


Also something is wrong with the following test :

SELECT /* aggregate <> partial aggregate */
 array_agg(c_int4array), array_agg(b),
 avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
 avg(b::float4), avg(b::float8),
 corr(b::float8, (b * b)::float8),
 covar_pop(b::float8, (b * b)::float8),
 covar_samp(b::float8, (b * b)::float8),
 regr_avgx((2 * b)::float8, b::float8),
.

Its results have changed since last patch. Do they depend on daylight
saving time?

You are right. In my environment, TimeZone is set to 'PST8PDT'
with which timetz values depends on daylight saving time.
Changed TimeZone to 'UTC' in this test.

You can see that filter is applied before append. The result is 
correct

only by chance, as sum in every partition is actually < 700. If you
lower this bound, let's say, to 200, you'll start getting wrong 
results

as data is filtered prior to aggregation.

It seems, however, that in partial case you should just avoid pulling
conditions from having qual at all, all filters will be applied on 
upper

level. Something like

Thank you for your modification.

Found one more problem. You can fire partial aggregate over 
partitioned
table, but convert_combining_aggrefs() will make non-partial copy, 
which

leads to
'variable not found in subplan target list' error.

Thanks for the correction as well.
As you pointed out,
the original patch certainly had the potential to cause problems.
However, I could not actually reproduce the problem in cases such as 
the following.


  Settings:
t(c1, c2) is a patitioned table whose partition key is c1.
t1, t2 are patitions of t and are partitioned table.
t11, t12: partitions of t1 and foreign table of postgres_fdw.
t21, t22: partitions of t2 and foreign table of postgres_fdw.
  Query:
select c2 / 2, sum(c1) from t group by c2 / 2 order by 1

If you have a reproducible example, I would like to add it to
the regression test.
Do you have a reproducible example?


Also denied partial agregates pushdown on server version mismatch.
Should check_partial_aggregate_support be 'true' by default?
Could we discuss this point after we determine how to transfer state 
values?
If we determine this point, we can easly determine whether 
check_partial_aggregate_support shold be 'true' by default.



I'm not sure what to do with current grammar - it precludes partial
distinct aggregates. I understand that it's currently impossible to 
have

partial aggregation for distinct agregates -but does it worth to have
such restriction at grammar level?
If partial aggregation for distinct agregates becomes possible in the 
future,

I see no problem with the policy of accepting new SQL keywords,
such as "PARTIL_AGGREGATE DISTINCT".


BTW, there's I have an issue with test results in the last version of 
the patch. Attaching regression diffs.

I have partial sum over c_interval instead of sum(c_interval).


--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff -U3 /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out
--- /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/expected/postgres_fdw.out	2024-05-28 08:48:27.236520098 +0300
+++ /home/leoric/srcs/shardman-backup/shardman/contrib/postgres_fdw/results/postgres_fdw.out	2024-05-28 08:51:24.395704846 +0300
@@ -10438,15 +10438,15 @@
  ->  Foreign Scan
Output: (PARTIAL array_agg(pagg_tab.c_int4array)), (PARTIAL array_agg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::smallint)), (PARTIAL avg(pagg_tab.b)), (PARTIAL avg((pagg_tab.b)::bigint)), (PARTIAL avg(pagg_tab.c_interval)), (PARTIAL avg((pagg_tab.b)::real)), (PARTIAL avg((pagg_tab.b)::double precision)), (PARTIAL avg((pagg_tab.b)::numeric)), (PARTIAL corr((pagg_tab.b)::double precision, ((pagg_tab.b * pagg_tab.b))::double precision)), (PARTIAL covar_pop((pagg_tab.b)::double p

Re: Partial aggregates pushdown

2024-05-27 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-05-28 00:30:

Hi Mr. Pyhalov.


Hi.

Found one more problem. You can fire partial aggregate over 
partitioned
table, but convert_combining_aggrefs() will make non-partial copy, 
which

leads to
'variable not found in subplan target list' error.

Thanks for the correction as well.
As you pointed out,
the original patch certainly had the potential to cause problems.
However, I could not actually reproduce the problem in cases such as 
the following.


  Settings:
t(c1, c2) is a patitioned table whose partition key is c1.
t1, t2 are patitions of t and are partitioned table.
t11, t12: partitions of t1 and foreign table of postgres_fdw.
t21, t22: partitions of t2 and foreign table of postgres_fdw.
  Query:
select c2 / 2, sum(c1) from t group by c2 / 2 order by 1

If you have a reproducible example, I would like to add it to
the regression test.
Do you have a reproducible example?



The fix was to set child_agg->agg_partial to orig_agg->agg_partial in 
convert_combining_aggrefs(), it's already in the patch,

as well as the example - without this fix

-- Check partial aggregate over partitioned table
EXPLAIN (VERBOSE, COSTS OFF)
SELECT avg(PARTIAL_AGGREGATE a), avg(a) FROM pagg_tab;

fails with

ERROR:  variable not found in subplan target list
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-05-24 Thread Alexander Pyhalov


Ilya Gladyshev писал(а) 2024-05-24 00:14:

Hi,


Hi.



I think it's well worth the effort to revive the patch, so I rebased it 
on master, updated it and will return it back to the commitfest. 
Alexander, Justin feel free to add yourselves as authors


On 29.01.2024 12:43, Alexander Pyhalov wrote:

Hi.

I've rebased patch on master and it'seems to me there's one more issue 
-


when we call DefineIndexConcurrentInternal() in partitioned case, it 
waits for transactions, locking tableId, not tabrelid - heaprelid 
LockRelId is constructed for parent index relation, not for child 
index relation.


Attaching fixed version.

Also I'm not sure what to do with locking of child relations. If we 
don't do anything, you can drop one of the partitioned table childs 
while CIC is in progress, and get error


ERROR:  cache lookup failed for index 16399
I agree that we need to do something about it, in particular, I think 
we should lock all the partitions inside the transaction that builds 
the catalog entries. Fixed this in the new version.
If you try to lock all child tables in CIC session, you'll get 
deadlocks.


Do you mean the deadlock between the transaction that drops a partition 
and the transaction doing CIC? I think this is unavoidable and can be 
reproduced even without partitioning.


Yes, it seems we trade this error for possible deadlock between 
transaction, dropping a partition, and CIC.




Also not sure why a list of children relation was obtained with 
ShareLock that CIC is supposed to avoid not to block writes, changed 
that to ShareUpdateExclusive.




I expect that it wasn't an issue due to the fact that it's held for a 
brief period until DefineIndexConcurrentInternal() commits for the first 
time. But it seems, it's more correct to use ShareUpdateExclusive lock 
here.



Also I'd like to note that in new patch version there's a strange 
wording in documentation:


"This can be very convenient as not only will all existing partitions be
 indexed, but any future partitions will be as well.
 CREATE INDEX ... CONCURRENTLY can incur long lock 
times

 on huge partitioned tables, to avoid that you can
 use CREATE INDEX ON ONLY the partitioned table, 
which
 creates the new index marked as invalid, preventing automatic 
application

 to existing partitions."

All the point of CIC is to avoid long lock times. So it seems this 
paragraph should be rewritten in the following way:


"To avoid long lock times, you can use CREATE INDEX CONCURRENTLY or 
CREATE INDEX ON ONLY the partitioned table..."



--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2024-03-25 Thread Alexander Pyhalov

 
 grouped_rel->relids,
 
 NULL,
 
 NULL);
-   if (is_foreign_expr(root, grouped_rel, expr) && 
!partial)

+   if (is_foreign_expr(root, grouped_rel, expr))
fpinfo->remote_conds = 
lappend(fpinfo->remote_conds, rinfo);

    else
fpinfo->local_conds = 
lappend(fpinfo->local_conds, rinfo);



From: Alexander Pyhalov 
Sent: Wednesday, February 28, 2024 10:43 PM
contrib/postgres_fdw/deparse.c: comment before appendFunctionName() 
has gone, this seems to be wrong.

Fixed.


From: Alexander Pyhalov 
Sent: Wednesday, February 28, 2024 10:43 PM
In finalize_aggregate()

1079 /*
1080  * Apply the agg's finalfn if one is provided, else 
return

transValue.
1081  */

Comment should be updated to note behavior for agg_partial aggregates.

Fixed.


Comment in nodeAgg.c seems to be strange:

1079 /*
1080  * If the agg's finalfn is provided and PARTIAL_AGGREGATE 
keyword is

1081  * not specified, apply the agg's finalfn.
1082  * If PARTIAL_AGGREGATE keyword is specified and the 
transValue type
1083  * is internal, apply the agg's serialfn. In this case, if 
the agg's
1084  * serialfn must not be invalid. Otherwise return 
transValue.

1085  */

Likely, you mean:

... In this case the agg'ss serialfn must not be invalid...


Lower, in the same file, please, correct error message:

1136 if(!OidIsValid(peragg->serialfn_oid))
1137 elog(ERROR, "serialfunc is note provided 
for partial aggregate");


it should be "serialfunc is not provided for partial aggregate"

Also something is wrong with the following test :

 SELECT /* aggregate <> partial aggregate */
 array_agg(c_int4array), array_agg(b),
 avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
 avg(b::float4), avg(b::float8),
 corr(b::float8, (b * b)::float8),
 covar_pop(b::float8, (b * b)::float8),
 covar_samp(b::float8, (b * b)::float8),
 regr_avgx((2 * b)::float8, b::float8),
.

Its results have changed since last patch. Do they depend on daylight 
saving time?


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack

2024-03-22 Thread Alexander Pyhalov

Etsuro Fujita писал(а) 2024-03-21 13:59:
On Sun, Feb 25, 2024 at 6:34 PM Etsuro Fujita  
wrote:

> On Fri, Feb 23, 2024 at 01:21:14PM +0300, Alexander Pyhalov wrote:
> > Recent commit 555276f8594087ba15e0d58e38cd2186b9f39f6d introduced final
> > cleanup of node->as_eventset in ExecAppendAsyncEventWait().
> > Unfortunately, now this function can return in the middle of TRY/FINALLY
> > block, without restoring PG_exception_stack.
> >
> > We found this while working on our FDW. Unfortunately, I couldn't reproduce
> > the issue with postgres_fdw, but it seems it is also affected.

I think this would happen when FDWs configure no events; IIRC I think
while the core allows them to do so, postgres_fdw does not do so, so
this would never happen with it.

I was wrong; as you pointed out, this would affect postgres_fdw as
well.  See commit 1ec7fca85, which is my commit, but I forgot it
completely.  :-(

As I said before, the patch looks good to me.  I tweaked comments in
ExecAppendAsyncEventWait() a bit.  Attached is an updated patch.  In
the patch I also fixed a confusing comment in a related function in
postgres_fdw.c about handling of the in-process request that might be
useless to process.

Sorry, it took more time than expected to get back to this thread.

Hi. The updated patch still looks good to me.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2024-02-28 Thread Alexander Pyhalov

Hi.

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20:

Hi. Mr.Haas, hackers.

I apologize for the significant delay since my last post.
I have conducted investigations and considerations regarding the 
remaining tasks as follows.

Would it be possible for you to review them?
In particular, could you please confirm if the approach mentioned in 1. 
is acceptable?
If there are no issues with the direction outlined in 1., I plan to 
make a simple prototype based on this approach.

1. Transmitting state value safely between machines

From: Robert Haas 
Sent: Wednesday, December 6, 2023 10:25 PM
the problems around transmitting
serialized bytea blobs between machines that can't be assumed to fully 
trust each other will need to be addressed in some
way, which seems like it will require a good deal of design work, 
forming some kind of consensus, and then implementation

work to follow.
I have considered methods for safely transmitting state values between 
different machines.
I have taken into account the version policy of PostgreSQL (5 years of 
support) and the major version release cycle over the past 10 years (1 
year), and as a result, I have made the assumption that transmission is 
allowed only when the difference between the local version and the 
remote version is 5 or less.
I believe that by adding new components, "export function" and "import 
function", to the aggregate functions, and further introducing a new 
SQL keyword to the query syntax of aggregate expressions, we can 
address this issue.
If the version of the local server is higher than or equal to the 
version of the remote server, the proposed method can be simplified. 
The export version mentioned later in (1) would not be necessary. 
Furthermore, if the version of the local server matches the version of 
the remote server, the proposed method can be further simplified.
I would appreciate your input on reasonable assumptions regarding the 
differences in versions between the local server and the remote server.
I will explain the specifications of the export function, import 
function, the new SQL keyword for aggregate expressions, and the 
behavior of query processing for partial aggregation separately.

(1) Export Function Specification
This function is another final function for partial aggregate.
This function converts the state value that represents the result of 
partial aggregation into a format that can be read by the local server.
This function is called instead of the existing finalfunc during the 
final stage of aggregation when performing partial aggregation.

The conversion process described above will be referred to as "export".
The argument of an export function is the version of the server that 
will receive the return value.

Hereafter, this version will be referred to as the export version.
The concept of an export version is necessary to handle cases where the 
version of the local server is smaller than the version of the remote 
server.
The return value of the export function is the transformed state value, 
and its data type is bytea.
For backward compatibility, the developer of the export function must 
ensure that the export can be performed for major versions up to five 
versions prior to the major version of PostgreSQL that the export 
function is being developed for.
For built-in functions, I believe it is necessary to allow for the 
possibility of not developing the export functionality for specific 
versions in the future (due to reasons such as development burden) 
after the export function is developed for a certain version.
To achieve this, for built-in functions, we will add a column to the 
pg_aggregate catalog that indicates the presence or absence of export 
functionality for each major version, including the major version being 
developed and the previous five major versions. This column will be 
named safety_export_versions and will have a data type of boolean[6].
For user-defined functions, we will refer to the extensions option and 
add an external server option called safety_export_extensions, which 
will maintain a list of extensions that include only the aggregate 
functions that can be exported to the local server version.

...

I honestly think that achieving cross-version compatibility in this way 
puts a significant burden on developers. Can we instead always use the 
more or less universal export and import function to fix possible issues 
with binary representations on different architectures and just refuse 
to push down partial aggregates on server version mismatch? At least at 
the first step?

3. Fixing the behavior when the HAVING clause is present

From: Robert Haas 
Sent: Tuesday, November 28, 2023 4:08 AM

On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov 
 wrote:

> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign

ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack

2024-02-23 Thread Alexander Pyhalov


Hi.

Recent commit 555276f8594087ba15e0d58e38cd2186b9f39f6d introduced final 
cleanup of node->as_eventset in ExecAppendAsyncEventWait().
Unfortunately, now this function can return in the middle of TRY/FINALLY 
block, without restoring PG_exception_stack.


We found this while working on our FDW. Unfortunately, I couldn't 
reproduce the issue with postgres_fdw, but it seems it is also affected.


The following patch heals the issue.

-- l
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 025f40894d6d8f499144f0f7c45c0a124a46c408 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 23 Feb 2024 13:06:40 +0300
Subject: [PATCH] Avoid corrupting PG_exception_stack in
 ExecAppendAsyncEventWait()

After commit 555276f8594087ba15e0d58e38cd2186b9f39f6d
ExecAppendAsyncEventWait() could corrupt PG_exception_stack
after returning in the the middle of PG_TRY()/PG_END_TRY() block.
It should exit only after PG_END_TRY().
---
 src/backend/executor/nodeAppend.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index baba3ceea23..42962e80177 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1052,21 +1052,22 @@ ExecAppendAsyncEventWait(AppendState *node)
 		 */
 		if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
 		{
-			FreeWaitEventSet(node->as_eventset);
-			node->as_eventset = NULL;
-			return;
+			/* Return after PG_TRY()/PG_END_TRY() block */
+			noccurred = 0;
 		}
+		else
+		{
+			/* Return at most EVENT_BUFFER_SIZE events in one call. */
+			if (nevents > EVENT_BUFFER_SIZE)
+nevents = EVENT_BUFFER_SIZE;
 
-		/* Return at most EVENT_BUFFER_SIZE events in one call. */
-		if (nevents > EVENT_BUFFER_SIZE)
-			nevents = EVENT_BUFFER_SIZE;
-
-		/*
-		 * If the timeout is -1, wait until at least one event occurs.  If the
-		 * timeout is 0, poll for events, but do not wait at all.
-		 */
-		noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
-	 nevents, WAIT_EVENT_APPEND_READY);
+			/*
+			 * If the timeout is -1, wait until at least one event occurs.  If
+			 * the timeout is 0, poll for events, but do not wait at all.
+			 */
+			noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
+		 nevents, WAIT_EVENT_APPEND_READY);
+		}
 	}
 	PG_FINALLY();
 	{
-- 
2.34.1

Re: CREATE INDEX CONCURRENTLY on partitioned index

2024-01-29 Thread Alexander Pyhalov


Hi.

I've rebased patch on master and it'seems to me there's one more issue -

when we call DefineIndexConcurrentInternal() in partitioned case, it 
waits for transactions, locking tableId, not tabrelid - heaprelid 
LockRelId is constructed for parent index relation, not for child index 
relation.


Attaching fixed version.

Also I'm not sure what to do with locking of child relations. If we 
don't do anything, you can drop one of the partitioned table childs 
while CIC is in progress, and get error


ERROR:  cache lookup failed for index 16399

If you try to lock all child tables in CIC session, you'll get 
deadlocks.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 37a13b7fa1c3277b9d038b7a0c75399ff05b28a7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 29 Jan 2024 10:41:01 +0300
Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table

---
 doc/src/sgml/ddl.sgml  |   4 +-
 doc/src/sgml/ref/create_index.sgml |  14 +-
 src/backend/commands/indexcmds.c   | 200 ++---
 src/test/regress/expected/indexing.out | 127 +++-
 src/test/regress/sql/indexing.sql  |  26 +++-
 5 files changed, 296 insertions(+), 75 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 075ff329912..8ee80c40e3b 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4194,9 +4194,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
  so that they are applied automatically to the entire hierarchy.
  This is very
  convenient, as not only will the existing partitions become indexed, but
- also any partitions that are created in the future will.  One limitation is
- that it's not possible to use the CONCURRENTLY
- qualifier when creating such a partitioned index.  To avoid long lock
+ also any partitions that are created in the future will.  To avoid long lock
  times, it is possible to use CREATE INDEX ON ONLY
  the partitioned table; such an index is marked invalid, and the partitions
  do not get the index applied automatically.  The indexes on partitions can
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 40986aa502f..b05102efdaf 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -645,7 +645,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while build an index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -692,15 +695,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ab8b81b3020..65477aeb3a8 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -93,6 +93,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 			 bool primary, bool isconstraint);
 static char *ChooseIndexNameAddition(const List *colnames);
 static List *ChooseIndexColumnNames(const List *indexElems);
+static void DefineIndexConcurrentInternal(Oid relationId,
+			  Oid indexRelationId,
+			  IndexInfo *indexInfo,
+			  LOCKTAG heaplocktag,
+			  LockRelId heaprelid);
 static void ReindexIndex(const RangeVar *indexRelation, const ReindexParams *params,
 		 bool isTopLevel);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -554,7 +559,6 @@ DefineIndex(Oid tableId,
 	bool		amissummarizing;
 	amoptions_function amoptions;
 	bool		partitioned;
-	bool		safe_index;
 	Datum		reloptions;
 	int16	   *coloptions;
 	IndexInfo  *indexInfo;
@@ -562,12 +566,10 @@ DefineIndex(Oid tableId,
 	bits16		constr_flags;
 	int			numberOfAttributes;
 	int			numberOfKeyAttributes;
-	TransactionId limitXmin;
 	ObjectAddress address;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -697,20 +699,6 @@ DefineIndex(Oid tableId

Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-12-07 Thread Alexander Pyhalov

Andrei Lepikhov писал(а) 2023-12-08 07:37:

On 28/11/2023 01:37, Alexander Korotkov wrote:
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund
wrote:

Sorry for the late answer, I missed this thread because of vacation.

On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of
Bitmapsets?

We don't - but that's not specific to this patch. Bitmapsets
typically aren't

very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than
it'd save,

both in time and memory.

I'd already clashed with Tom on copying the required_relids field and
voluntarily made unnecessary copies in the project [1].
And ... stuck into huge memory consumption. The reason was in
Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join, one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such a field.
Alexander Pyhalov should remember that case.

Yes. If it matters, this happened during reparametrization when 2
partitioned tables with 1000 partitions each were joined. Then
asymmetric pw join managed to eat lots of memory for bitmapsets (by
lots of memory I mean all available on the test VM).

[1] Asymmetric partition-wise JOIN
https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2023-12-05 Thread Alexander Pyhalov

Alexander Korotkov писал(а) 2023-12-03 23:52:

Hi, Alexander!

On Mon, Nov 27, 2023 at 5:11 PM Alexander Pyhalov
 wrote:

Alexander Korotkov писал(а) 2023-11-27 03:49:

> Thank you for the revision.
>
> I've revised the patch myself.  I've replaced StringInfo with
> additional conds into a list of strings as I proposed before.  I think
> the code became much clearer.  Also, it gets rid of some unnecessary
> allocations.
>
> I think the code itself is not in bad shape.  But patch lacks some
> high-level description of semi-joins processing as well as comments on
> each manipulation with additional conds.  Could you please add this?
>

Hi. The updated patch looks better. It seems I've failed to fix logic 
in

deparseFromExprForRel() when tried to convert StringInfos to Lists.

I've added some comments. The most complete description of how 
SEMI-JOIN

is processed, is located in deparseFromExprForRel(). Unfortunately,
there seems to be no single place, describing current JOIN deparsing
logic.

Looks good to me. I've made some grammar and formatting adjustments.
Also, I've written the commit message.

Now, I think this looks good.  I'm going to push this if no objections.

--
Regards,
Alexander Korotkov

Hi. No objections from my side.

Perhaps, some rephrasing is needed in comment in semijoin_target_ok():

"The planner can create semi-joins, which refer to inner rel
vars in its target list."

Perhaps, change "semi-joins, which refer" to "a semi-join, which refers 
...",

as later we speak about "its" target list.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2023-11-27 Thread Alexander Pyhalov


Alexander Korotkov писал(а) 2023-11-27 03:49:


Thank you for the revision.

I've revised the patch myself.  I've replaced StringInfo with
additional conds into a list of strings as I proposed before.  I think
the code became much clearer.  Also, it gets rid of some unnecessary
allocations.

I think the code itself is not in bad shape.  But patch lacks some
high-level description of semi-joins processing as well as comments on
each manipulation with additional conds.  Could you please add this?



Hi. The updated patch looks better. It seems I've failed to fix logic in 
deparseFromExprForRel() when tried to convert StringInfos to Lists.


I've added some comments. The most complete description of how SEMI-JOIN 
is processed, is located in deparseFromExprForRel(). Unfortunately,
there seems to be no single place, describing current JOIN deparsing 
logic.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom c17e05d09d5721d22785ed11bed053162d67d967 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 27 Nov 2023 14:35:29 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

---
 contrib/postgres_fdw/deparse.c| 234 ++---
 .../postgres_fdw/expected/postgres_fdw.out| 320 --
 contrib/postgres_fdw/postgres_fdw.c   |  94 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 126 ++-
 5 files changed, 696 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09fd489a901..8670524578b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -180,11 +180,15 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
   Index ignore_rel, List **ignore_conds,
+  List **additional_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, List *additional_conds,
+			  deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds,
+			   List **additional_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1370,6 +1374,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	List	   *additional_conds = NIL;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
@@ -1379,14 +1384,11 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, _conds,
+		  context->params_list);
+	appendWhereClause(quals, additional_conds, context);
+	if (additional_conds != NIL)
+		list_free_deep(additional_conds);
 }
 
 /*
@@ -1598,6 +1600,42 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and additional_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, List *additional_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+	ListCell   *lc;
+
+	if (exprs != NIL || additional_conds != NIL)
+		appendStringInfoString(buf, " WHERE ");
+
+	/*
+	 * If there are some filters, append them.
+	 */
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+		need_and = true;
+	}
+
+	/*
+	 * If there are some EXISTS conditions, coming from SEMI-JOINS, append
+	 * them.
+	 */
+	foreach(lc, additional_conds)
+	{
+		if (need_and)
+			appendStringInfoString(buf, " AND ");
+		appendStringInfoString(buf, (char *) lfirst(lc));
+		need_and = true;
+	}
+}
+
 /* Output join name for given join type */
 const char *
 get_jointype_name(JoinType jointype)
@@ -1616,6 +1654,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";
 
+		case JOIN_SEMI:
+			return "SEMI";
+
 		default:
 			/* Shouldn't come here, but protect from buggy code. */
 			elog(ERROR, &

Re: Partial aggregates pushdown

2023-11-21 Thread Alexander Pyhalov


Robert Haas писал 2023-11-21 20:16:


> I don't think the patch does a good job explaining why HAVING,
> DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> shouldn't really be a problem, because HAVING is basically a WHERE
> clause that occurs after aggregation is complete, and whether or not
> the aggregation is safe shouldn't depend on what we're going to do
> with the value afterward. The HAVING clause can't necessarily be
> pushed to the remote side, but I don't see how or why it could make
> the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> little trickier: if we pushed down DISTINCT, we'd still have to
> re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> BY, we'd have to do a merge pass to combine the returned values unless
> we could prove that the partitions were non-overlapping ranges that
> would be visited in the correct order. Although that all sounds
> doable, I think it's probably a good thing that the current patch
> doesn't try to handle it -- this is complicated already. But it should
> explain why it's not handling it and maybe even a bit about how it
> could be handling in the future, rather than just saying "well, this
> kind of thing is not safe." The trouble with that explanation is that
> it does nothing to help the reader understand whether the thing in
> question is *fundamentally* unsafe or whether we just don't have the
> right code to make it work.

Makes sense.


Actually, I think I was wrong about this. We can't handle ORDER BY or
DISTINCT because we can't distinct-ify or order after we've already
partially aggregated. At least not in general, and not without
additional aggregate support functions. So what I said above was wrong
with respect to those. Or so I believe, anyway. But I still don't see
why HAVING should be a problem.


Hi. HAVING is also a problem. Consider the following query

SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
foreign server as HAVING needs full aggregate result, but foreign server
don't know it.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2023-10-31 Thread Alexander Pyhalov


Alexander Korotkov писал 2023-10-30 19:05:

Hi, Alexander!

Thank you for working on this.  I believe this is a very interesting
patch, which significantly improves our FDW-based distributed
facilities.  This is why I decided to review this.



Hi. Thanks for reviewing.


+   /*
+* We can't push down join if its reltarget is not safe
+*/
+   if (!joinrel_target_ok(root, joinrel, jointype, outerrel,
innerrel))
return false;

As I get joinrel_target_ok() function do meaningful checks only for
semi join and always return false for all other kinds of joins.  I
think we should call this only for semi join and name the function
accordingly.


Done.



+   fpinfo->unknown_subquery_rels =
bms_union(fpinfo_o->unknown_subquery_rels,
+
fpinfo_i->unknown_subquery_rels);

Should the comment before this code block be revised?


Updated comment.



+   case JOIN_SEMI:
+   fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+ fpinfo_i->remote_conds);
+   fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+  fpinfo->remote_conds);
+   fpinfo->remote_conds = list_copy(fpinfo_o->remote_conds);
+   fpinfo->unknown_subquery_rels =
bms_union(fpinfo->unknown_subquery_rels,
+   innerrel->relids);
+   break;

I think that comment before switch() should be definitely revised.

+ Relids hidden_subquery_rels; /* relids, which can't be referred to
+ * from upper relations */

Could this definition contain the positive part?  Can't be referred to
from upper relations, but used internally for semi joins (or something
like that)?


Made comment a bit more verbose.



Also, I think the machinery around the append_conds could be somewhat
simpler if we turn them into a list (list of strings).  I think that
should make code clearer and also save us some memory allocations.



I've tried to rewrite it as managing lists.. to find out that these are 
not lists.
I mean, in deparseFromExprForRel() we replace lists from both side with 
one condition.
This allows us to preserve conditions hierarchy. We should merge these 
conditions
in the end of IS_JOIN_REL(foreignrel) branch, or we'll push them too 
high. And if we
deparse them in this place as StringInfo, I see no benefit to convert 
them to lists.




In [1] you've referenced the cases, when your patch can't push down
semi-joins.  It doesn't seem impossible to handle these cases, but
that would make the patch much more complicated.  I'm OK to continue
with a simpler patch to handle the majority of cases.  Could you
please add the cases, which can't be pushed down with the current
patch, to the test suite?



There are several cases when we can't push down semi-join in current 
patch.


1) When target list has attributes from inner relation, which are 
equivalent to some attributes of outer

relation, we fail to notice this.

2) When we examine A join B and decide that we can't push it down, this 
decision is final - we state it in fdw_private of joinrel,
and so if we consider joining these relations in another order, we don't 
reconsider.
This means that if later examine B join A, we don't try to push it down. 
As semi-join can be executed as JOIN_UNIQUE_INNER or JOIN_UNIQUE_OUTER,
this can be a problem - we look at some of these paths and remember that 
we can't push down such join.






--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 91ae85ac735c9f109cd3ab3603693177011e5b94 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Nov 2022 10:23:32 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 206 ---
 .../postgres_fdw/expected/postgres_fdw.out| 320 --
 contrib/postgres_fdw/postgres_fdw.c   |  94 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 126 ++-
 5 files changed, 668 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09fd489a901..cb0e373055d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -180,11 +180,14 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
   Index ignore_rel, List **ignore_conds,
+  StringInfo additional_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 sta

Re: Partial aggregates pushdown

2023-09-28 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-28 07:40:


I'm not sure that I like this mechanics of adding sort group clauses -
it seems we do in core additional work, which is of use only for
one extension, but at least it seems to be working.

We cannot deparse the original sort group clauses and pathtarget
when performing partial aggreggate pushdown by any FDWs.
So I think the additional sort group clauses and pathtarget are
needed by any FDWs, not only postgres_fdw.



Hi.
It seems to me that *fdw postfixes don't clarify things, but just make 
naming more ugly.


+ * Adding these Vars and PlaceHolderVars to PathTarget,
+ * FDW cannot deparse this by the original List of SortGroupClauses.
+ * So, before this adding process,
+ * setGroupClausePartial generates another Pathtarget and another
+ * List of SortGroupClauses for FDW.

It seems that something like:

/*
 * Modified PathTarget cannot be used by FDW as-is to deparse this 
statement.

 * So, before modifying PathTarget, setGroupClausePartial generates
 * another Pathtarget and another list List of SortGroupClauses
 * to make deparsing possible.
 */

sounds better.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-09-27 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-27 01:35:

Hi Mr.Momjian, Mr.Pyhalov.

Tuesday, 26 September 2023 22:15 Alexander Pyhalov 
:

Do you mean that extra->partial_target->sortgrouprefs is not replaced,
and so we preserve tlesortgroupref numbers?

Yes, that is correct.


I'm suspicious about rewriting extra->partial_target->exprs with
partial_target->exprs - I'm still not sure why we
  don't we loose information, added by add_column_to_pathtarget() to
extra->partial_target->exprs?



Hi.

In postgres_fdw.sql

"Partial aggregates are unsafe to push down having clause when there are 
partial aggregates" - this comment likely should be fixed.


Some comments should be added to setGroupClausePartial() and to 
make_partial_grouping_target() - especially why setGroupClausePartial()

is called prior to add_new_columns_to_pathtarget().

I'm not sure that I like this mechanics of adding sort group clauses - 
it seems we do in core additional work, which is of use only for

one extension, but at least it seems to be working.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-09-26 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-09-25 06:18:

Hi Mr.Bruce, Mr.Pyhalov, Mr.Finnerty, hackers.

Thank you for your valuable comments. I sincerely apologize for the
very late reply.
Here is a response to your comments or a fix to the patch.

Tuesday, August 8, 2023 at 3:31 Bruce Momjian

> I have modified the program except for the point "if the version of the remote 
server is less than PG17".
> Instead, we have addressed the following.
> "If check_partial_aggregate_support is true and the remote server version is 
older than the local server
> version, postgres_fdw does not assume that the partial aggregate function is 
on the remote server unless
> the partial aggregate function and the aggregate function match."
> The reason for this is to maintain compatibility with any aggregate function 
that does not support partial
> aggregate in one version of V1 (V1 is PG17 or higher), even if the next 
version supports partial aggregate.
> For example, string_agg does not support partial aggregation in PG15, but it 
will support partial aggregation
> in PG16.

Just to clarify, I think you are saying:

If check_partial_aggregate_support is true and the remote 
server

version is older than the local server version, postgres_fdw
checks if the partial aggregate function exists on the remote
server during planning and only uses it if it does.

I tried to phrase it in a positive way, and mentioned the plan time
distinction.  Also, I am sorry I was away for most of July and am just
getting to this.

Thanks for your comment. In the documentation, the description of
check_partial_aggregate_support is as follows
(please see postgres-fdw.sgml).
--
check_partial_aggregate_support (boolean)
Only if this option is true, during query planning, postgres_fdw
connects to the remote server and check if the remote server version
is older than the local server version. If so, postgres_fdw assumes
that for each built-in aggregate function, the partial aggregate
function is not defined on the remote server unless the partial
aggregate function and the aggregate function match. The default is
false.
--

Thursday, 20 July 2023 19:23 Alexander Pyhalov 
:

fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.

> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
>

Hi.
Sorry, but I don't see how it could work.

We apologize for any inconvenience caused.
Thanks to Pyhalov's and Jim's comments, I have realized that I have
made a fundamental mistake regarding the pushdown of the HAVING clause
and the difficulty of achieving it performing Partial aggregate
pushdown.
So, I removed the codes about pushdown of the HAVING clause performing
Partial aggregate pushdown.

Thursday, 20 July 2023 19:23 Alexander Pyhalov 
:

As for changes in planner.c (setGroupClausePartial()) I have several
questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when
partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs
after processing. Why are we sure that after this tleSortGroupRef is
correct?

Response to 1)
The code you pointed out was unnecessary. I have removed this code.
Also, the process of adding PlaceHolderVar's expr to partial_target was 
missing.

So I fixed this.

Response to 2)
The making procedures extra->groupClausePartial and 
extra->partial_target

in make_partial_grouping_target for this patch is as follows.
STEP1. From grouping_target->exprs, extract Aggref, Var and
Placeholdervar that are not included in Aggref.
STEP2. setGroupClausePartial sets the copy of original groupClause to
extra->groupClausePartial
and sets the copy of original partial_target to extra->partial_target.
STEP3. setGroupClausePartial adds Var and Placeholdervar in STEP1 to
partial_target.
The sortgroupref of partial_target->sortgrouprefs to be added to value 
is set to

(the maximum value of the existing sortgroupref) + 1.
setGroupClausePartial adds data sgc of sortgroupclause type where
sgc->tlesortgroupref
matches the sortgroupref to GroupClause.
STEP4. add_new_columns_to_pathtarget adds STEP1's Aggref to 
partial_target.


Due to STEP2, the list of tlesortgrouprefs set in
extra->groupClausePartial is not duplicated.


Do you mean that extra->partial_target->sortgrouprefs is not replaced, 
and so we preserve tlesortgroupref numbers?
I'm suspicious about rewriting extra->partial_target->exprs with 
partial_target->exprs - I'm still not sure why we
 don't we loose information, added by add_column_to_pathtarget() to 
extra->partial_target->exprs?


Also look at the following example.

EXPLAIN VERBOSE SELECT  count(*) , (b/2)::numeric FROM pagg_tab GROUP BY 
b/2 ORDER BY 1;

QUERY PLAN
--

postgres_fdw could support row comparison pushdown

2023-08-07 Thread Alexander Pyhalov


Hi.

postgres_fdw currently doesn't handle RowCompareExpr, which doesn't 
allow keyset pagination queries to be efficiently executed over sharded 
table.
Attached patch adds handling of RowCompareExpr in deparse.c, so that we 
could push down conditions like
WHERE (created, id) > ('2023-01-01 00:00:00'::timestamp, 12345) to the 
foreign server.


I'm not sure about conditions when it's possible for RowCompareExpr to 
have opnos with different names or namespaces, but comment in 
ruleutils.c suggests that this is possible, so I've added check for this 
in foreign_expr_walker().

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 655148c85768afbbfc034e6f5dc5a5a6d72139b8 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Aug 2023 11:47:31 +0300
Subject: [PATCH] postgres_fdw: support RowCompareExpr pushdown

---
 contrib/postgres_fdw/deparse.c| 139 ++
 .../postgres_fdw/expected/postgres_fdw.out|  49 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |   9 ++
 3 files changed, 197 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09d6dd60ddc..5eed3ac981c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -166,6 +166,7 @@ static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
 static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
 static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
+static void deparseRowCompareExpr(RowCompareExpr *node, deparse_expr_cxt *context);
 static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 			 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
@@ -287,6 +288,51 @@ is_foreign_expr(PlannerInfo *root,
 	return true;
 }
 
+/*
+ * Determines if the names and namespaces of operations match
+ */
+static bool
+opnames_match(List *opnos)
+{
+	Oid			oprns = InvalidOid;
+	char	   *oprname = NULL;
+	ListCell   *lc;
+	bool		match = true;
+
+	foreach(lc, opnos)
+	{
+		HeapTuple	opertup;
+		Form_pg_operator operform;
+		Oid			opno = lfirst_oid(lc);
+
+		opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+		if (!HeapTupleIsValid(opertup))
+			elog(ERROR, "cache lookup failed for operator %u", opno);
+		operform = (Form_pg_operator) GETSTRUCT(opertup);
+		/* First op */
+		if (oprname == NULL)
+		{
+			oprname = pstrdup(NameStr(operform->oprname));
+			oprns = operform->oprnamespace;
+		}
+		else
+		{
+			Assert(OidIsValid(oprns));
+			if (oprns != operform->oprnamespace || (strcmp(oprname, NameStr(operform->oprname)) != 0))
+match = false;
+		}
+		ReleaseSysCache(opertup);
+
+		if (!match)
+			break;
+	}
+
+	if (oprname)
+		pfree(oprname);
+
+	return match;
+}
+
 /*
  * Check if expression is safe to execute remotely, and return true if so.
  *
@@ -872,6 +918,44 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_RowCompareExpr:
+			{
+RowCompareExpr *rce = (RowCompareExpr *) node;
+ListCell   *lc;
+
+if (list_length(rce->opnos) == 0)
+	return false;
+
+/*
+ * Only shippable operators can be sent to remote.
+ */
+foreach(lc, rce->opnos)
+{
+	if (!is_shippable(lfirst_oid(lc), OperatorRelationId, fpinfo))
+		return false;
+}
+
+/* If opnos names do not match, can't deparse such expression */
+if (!opnames_match(rce->opnos))
+	return false;
+
+/*
+ * Recurse to arguments
+ */
+if (!foreign_expr_walker((Node *) rce->largs,
+		 glob_cxt, _cxt, case_arg_cxt))
+	return false;
+
+if (!foreign_expr_walker((Node *) rce->rargs,
+		 glob_cxt, _cxt, case_arg_cxt))
+	return false;
+
+/* Output is always boolean and so noncollatable. */
+collation = InvalidOid;
+state = FDW_COLLATE_NONE;
+
+			}
+			break;
 		case T_List:
 			{
 List	   *l = (List *) node;
@@ -2785,6 +2869,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
 		case T_ArrayExpr:
 			deparseArrayExpr((ArrayExpr *) node, context);
 			break;
+		case T_RowCompareExpr:
+			deparseRowCompareExpr((RowCompareExpr *) node, context);
+			break;
 		case T_Aggref:
 			deparseAggref((Aggref *) node, context);
 			break;
@@ -3508,6 +3595,58 @@ deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context)
 		 deparse_type_name(node->array_typeid, -1));
 }
 
+/*
+ * Deparse RowCompareExpr
+ */
+static void
+deparseRowCompareExpr(RowCompareExpr *node, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		first;
+	ListCell   *arg;
+	HeapTuple	opertup;
+	Form_pg_operator operform;
+	Oid			opno = linitial_oid(node->opnos);
+
+	/* Deparse the first argument */
+	appendStringInfoString(buf, "(ROW(");
+
+	first = true;
+	foreach(arg, node->largs)
+	{
+		if (!firs

Re: Partial aggregates pushdown

2023-07-20 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43:

Hi Mr.Pyhalov, hackers.



3)
I modified the patch to safely do a partial aggregate pushdown for
queries which contain having clauses.



Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
  return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 
1;

 b  | sum
+-
  0 |   0
 10 |   0
 20 |   0
 30 |   0
 40 |   0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
 b  | sum
+--
  0 |  600
  1 |  660
  2 |  720
  3 |  780
  4 |  840
  5 |  900
  6 |  960
  7 | 1020
  8 | 1080
  9 | 1140
 10 |  600
 11 |  660
 12 |  720

shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 
1;


You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
 b  | sum
+--
 42 |  720
 29 | 1140
  4 |  840
 34 |  840
 41 |  660
  0 |  600
 40 |  600
gives.

The issue is that you can't calculate "partial" having. You should 
compare full aggregate in filter, but it's not possible on the level of 
one partition.

And you have this in plans

 Finalize GroupAggregate
   Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
   Group Key: pagg_tab.b
   Filter: (sum(pagg_tab.a) < 700)
   ->  Sort
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))

 Sort Key: pagg_tab.b
 ->  Append
   ->  Foreign Scan
 Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
 Filter: ((PARTIAL sum(pagg_tab.a)) < 700)    
<--- here we can't compare anything yet, sum is incomplete.
 Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))

 Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1

   ->  Foreign Scan
 Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))

 Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
 Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
 Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1


In foreign_grouping_ok()
6586 if (IsA(expr, Aggref))
6587 {
6588 if (partial)
6589 {
6590 mark_partial_aggref((Aggref 
*) expr, AGGSPLIT_INITIAL_SERIAL);

6591 continue;
6592 }
6593 else if (!is_foreign_expr(root, 
grouped_rel, expr))

6594 return false;
6595
6596 tlist = add_to_flat_tlist(tlist, 
list_make1(expr));

6597 }

at least you shouldn't do anything with expr, if is_foreign_expr() 
returned false. If we restrict pushing down queries with havingQuals, 
I'm not quite sure how Aggref can appear in local_conds.


As for changes in planner.c (setGroupClausePartial()) I have several 
questions.


1) Why don't we add non_group_exprs to pathtarget->exprs when 
partial_target->exprs is not set?


2) We replace extra->partial_target->exprs with partial_target->exprs 
after processing. Why are we sure that after this tleSortGroupRef is 
correct?


--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1824cb67fe9..c6f613019c3 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3025,6 +3025,22 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 
+CREATE FUNCTION f() RETURNS INT AS $$
+begin
+  return 10;
+end 
+$$ LANGUAGE PLPGSQL;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, sum(a) FROM

Re: Partial aggregates pushdown

2023-07-14 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-10 10:35:

I have modified the program except for the point "if the version of
the remote server is less than PG17".
Instead, we have addressed the following.
"If check_partial_aggregate_support is true and the remote server
version is older than the local server
version, postgres_fdw does not assume that the partial aggregate
function is on the remote server unless
the partial aggregate function and the aggregate function match."
The reason for this is to maintain compatibility with any aggregate
function that does not support partial
aggregate in one version of V1 (V1 is PG17 or higher), even if the
next version supports partial aggregate.
For example, string_agg does not support partial aggregation in PG15,
but it will support partial aggregation
in PG16.



Hi.

1) In foreign_join_ok() should we set fpinfo->user if 
fpinfo->check_partial_aggregate_support is set like it's done for 
fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user = 
NULL if use_remote_estimate is not set.


2) It seeems we found an additional issue with original patch, which is 
present in current one. I'm attaching a patch which seems to fix it, but 
I'm not quite sure in it.




We have not been able to add a test for the case where the remote
server version is older than the
local server version to the regression test. Is there any way to add
such tests to the existing regression
tests?



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 187d15185200aabc22c5219bbe636bc950670a78 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 14 Jul 2023 16:34:02 +0300
Subject: [PATCH] For partial aggregation we can't rely on the fact that every
 var is a part of some GROUP BY expression

---
 .../postgres_fdw/expected/postgres_fdw.out| 120 ++
 contrib/postgres_fdw/postgres_fdw.c   |  31 +++--
 contrib/postgres_fdw/sql/postgres_fdw.sql |   9 ++
 3 files changed, 152 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8d80ba0a6be..31ee461045c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9955,6 +9955,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
  49 | 19. |  29 |60
 (50 rows)
 
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+   QUERY PLAN   
+
+ Sort
+   Output: ((pagg_tab.b / 2)), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*))
+   Sort Key: ((pagg_tab.b / 2))
+   ->  Finalize HashAggregate
+ Output: ((pagg_tab.b / 2)), avg(pagg_tab.a), max(pagg_tab.a), count(*)
+ Group Key: ((pagg_tab.b / 2))
+ ->  Append
+   ->  Foreign Scan
+ Output: ((pagg_tab.b / 2)), (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p1 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_1.b / 2)), (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p2 GROUP BY 1
+   ->  Foreign Scan
+ Output: ((pagg_tab_2.b / 2)), (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*))
+ Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2)
+ Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p3 GROUP BY 1
+(19 rows)
+
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+ ?column? | avg | max | count 
+--+-+-+---
+0 | 10.5000 |  21 |   120
+1 | 12.5000 |  23 |   120
+2 | 14.5000 |  25 |   120
+3 | 16.5000 |  27 |   120
+4 | 18.5000 |  29 |   120
+5 | 10.5000 |  21 |   120
+6 | 12.5000 |  23 |   120
+7 | 14.5000 |  25 |   120
+8 | 16.5000 |  27 |   120
+9 | 18.5000 |  29 |   120
+   10 | 10.5000 |  21 |   120
+   11 | 12.5000 |  23 |   120
+   12 | 14.5000 |  25 |   120

Re: CREATE INDEX CONCURRENTLY on partitioned index

2023-07-13 Thread Alexander Pyhalov

Justin Pryzby писал 2023-07-13 05:27:

On Mon, Mar 27, 2023 at 01:28:24PM +0300, Alexander Pyhalov wrote:

Justin Pryzby писал 2023-03-26 17:51:
> On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote:
> > This currently handles partitions with a loop around the whole CIC
> > implementation, which means that things like WaitForLockers() happen
> > once for each index, the same as REINDEX CONCURRENTLY on a partitioned
> > table.  Contrast that with ReindexRelationConcurrently(), which handles
> > all the indexes on a table in one pass by looping around indexes within
> > each phase.
>
> Rebased over the progress reporting fix (27f5c712b).
>
> I added a list of (intermediate) partitioned tables, rather than looping
> over the list of inheritors again, to save calling rel_get_relkind().
>
> I think this patch is done.

Overall looks good to me. However, I think that using 'partitioned' as 
list
of partitioned index oids in DefineIndex() is a bit misleading - we've 
just

used it as boolean, specifying if we are dealing with a partitioned
relation.

Right.  This is also rebased on 8c852ba9a4 (Allow some exclusion
constraints on partitions).

Hi.
I have some more question.
In the following code (indexcmds.c:1640 and later)

1640 rel = table_open(relationId, 
ShareUpdateExclusiveLock);

1641 heaprelid = rel->rd_lockInfo.lockRelId;
1642 table_close(rel, ShareUpdateExclusiveLock);
1643 SET_LOCKTAG_RELATION(heaplocktag, 
heaprelid.dbId, heaprelid.relId);

should we release ShareUpdateExclusiveLock before getting session lock 
in DefineIndexConcurrentInternal()?
Also we unlock parent table there between reindexing childs in the end 
of DefineIndexConcurrentInternal():

1875 /*
1876  * Last thing to do is release the session-level lock on 
the parent table.

1877  */
1878 UnlockRelationIdForSession(, 
ShareUpdateExclusiveLock);

1879 }

Is it safe? Shouldn't we hold session lock on the parent table while 
rebuilding child indexes?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: memory leak in trigger handling (since PG12)

2023-06-23 Thread Alexander Pyhalov


Tomas Vondra писал 2023-06-22 17:16:

On 6/22/23 13:46, Tomas Vondra wrote:

...

I haven't tried the reproducer, but I think I see the issue - we store
the bitmap as part of the event to be executed later, but the bitmap 
is
in per-tuple context and gets reset. So I guess we need to copy it 
into

the proper long-lived context (e.g. AfterTriggerEvents).

I'll get that fixed.



Alexander, can you try if this fixes the issue for you?


regard


Hi.
The patch fixes the problem and looks good to me.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: memory leak in trigger handling (since PG12)

2023-06-22 Thread Alexander Pyhalov


Tomas Vondra писал 2023-05-25 17:41:


The attached patch does this - I realized we actually have estate in
ExecGetAllUpdatedCols(), so we don't even need a variant with a
different signature. That makes the patch much simpler.

The question is whether we need the signature anyway. There might be a
caller expecting the result to be in CurrentMemoryContext (be it
ExecutorState or something else), and this change might break it. But
I'm not aware of any callers, so maybe that's fine.



Hi.
Unfortunately, this patch has broken trigdata->tg_updatedcols usage in 
AFTER UPDATE triggers.

Should it be if not fixed, but at least mentioned in documentation?

Attaching sample code. After creating trigger, an issue can be 
reproduced as this:


create table test (i int, j int);
create function report_update_fields() RETURNS TRIGGER AS 
'/location/to/trig_test.so' language C;
create trigger test_update after update ON test FOR EACH ROW EXECUTE 
FUNCTION report_update_fields();

insert into test values (1, 12);
update test set j=2;

--
Best regards,
Alexander Pyhalov,
Postgres Professional#include 
#include 
#include 
#include 
#include 

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif


void _PG_init()
{
}

PG_FUNCTION_INFO_V1(report_update_fields);

Datum
report_update_fields(PG_FUNCTION_ARGS)
{
	TriggerData *trigdata = (TriggerData *) fcinfo->context;
	int			tgnargs,
ret;
	char	   *rgids,
			   *gtname,
			   *nspname,
			   *rname;
	HeapTuple	rettuple = NULL;
	List	   *rgidlist = NIL;
	ListCell   *lc;

	if (!CALLED_AS_TRIGGER(fcinfo) || /*!TRIGGER_FIRED_AFTER(trigdata->tg_event) || */ TRIGGER_FIRED_FOR_STATEMENT(trigdata->tg_event) || !TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
		elog(ERROR, "usage unsupported");
	
	if ((trigdata->tg_trigtuple != NULL) && (trigdata->tg_newtuple != NULL))
	{
int attnum = -1;
int num_upd_attrs = 0;

while ((attnum = bms_next_member(trigdata->tg_updatedcols, attnum)) >= 0)
if (attnum + FirstLowInvalidHeapAttributeNumber > 0)
num_upd_attrs++;

elog(INFO, "updated %d attrs", num_upd_attrs);
}
}

Re: Partial aggregates pushdown

2023-06-20 Thread Alexander Pyhalov


Bruce Momjian писал 2023-06-20 03:42:

Apologies for the delay in my reply to this email.  I looked into the
existing code and I found three things:

1)  PQserverVersion() just pulls the conn->sversion value from the
existing connection because pqSaveParameterStatus() pulls the
server_version sent by the backend;  no need to issue SELECT version().

2)  postgres_fdw already has nine calls to GetConnection(), and only
opens a connection if it already doesn't have one.  Here is an example:

/* Get the remote estimate */
conn = GetConnection(fpinfo->user, false, NULL);
get_remote_estimate(sql.data, conn, , ,
_cost, _cost);
ReleaseConnection(conn);

Therefore, it seems like it would be near-zero cost to just call conn =
GetConnection() and then PQserverVersion(conn), and 
ReleaseConnection().

You can then use the return value of PQserverVersion() to determine if
you can push down partial aggregates.



Hi.
Currently we don't get remote connection while planning if 
use_remote_estimate is not set.
Such change would require to get remote connection in planner, not in 
executor.
This can lead to change of behavior (like errors in explain when user 
mapping is wrong - e.g. bad password is specified).
Also this potentially can lead to establishing connections even when 
plan node is not actually used

(like extreme example - select sum(score) from t limit 0).
I'm not saying we shouldn't do it - just hint at possible consequences.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-09 Thread Alexander Pyhalov


Hi.

+   An aggregate function, called the partial aggregate function for 
partial aggregate
+   that corresponding to the aggregate function, is defined on the 
primary node and

+   the postgres_fdw node.

Something is clearly wrong here.

+   When using built-in sharding feature in PostgreSQL is used,

And here.

Overall the code looks good to me, but I suppose that documentation 
needs further review from some native speaker.



--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-08 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-08 02:08:

From: Alexander Pyhalov 
Sent: Wednesday, June 7, 2023 6:47 PM
This seems to be more robust, but the interface became more strange.
I'm not sure what to do with it. Some ideas I had to avoid introducing 
this

parameter. Not sure I like any of them.

1) You can use QualifiedNameGetCreationNamespace() for 
aggpartialfnName

and still compare namespace and function name  for it and  aggName,
aggNamespace.
Seems to be not ideal, but avoids introducing new parameters.

2) You can lookup for partial aggregate function after 
ProcedureCreate() in
AggregateCreate(), if it wasn't found at earlier stages. If it is the 
aggregate itself
- check it. If it's still not found, error out. Also seems to be a bit 
ugly - you leave

uncommitted garbage for vacuum in catalogue.

Thank you for suggesting alternatives.
The disadvantages of alternative 2) appear to be undesirable,
I have modified it according to alternative 1)

Another issue - the patch misses recording dependency between 
aggpartialfn

and aggregate procedure.

I added code to record dependencys between aggpartialfn
and aggregate procedure, similar to the code for functions such as 
combinefunc.




Hi.

Looks better. The only question I have is should we record dependency 
between procOid and aggpartialfn if aggpartialfn == procOid.


Also it seems new code likely should be run through pgindent.

doc/src/sgml/postgres-fdw.sgml:

+   For WHERE clauses,
+   JOIN clauses, this sending is active if
+   conditions in linkend="postgres-fdw-remote-query-optimization"/>
+   hold and enable_partitionwise_join is true(this 
condition

+   is need for only JOIN clauses).
+   For aggregate expressions, this sending is active if conditions in

No space between "true" and "(" in "is true(this condition".

Some sentences in documentation, like one starting with
"For aggregate expressions, this sending is active if conditions in..."
seem to be too long, but I'm not the best man to read out documentation.

In "Built-in sharding in PostgreSQL" term "shard" doesn't have a 
definition.


By the way, I'm not sure that "sharding" documentation belongs to this 
patch,

at least it needs a review from native speaker.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-07 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 15:31:

Thanks for the explanation.
I understand that the method of comparing two function name strings is
incorrect.
Instead, I added the parameter isaggpartialfunc indicating whether the 
aggregate

function and its aggpartialfunc are the same or different.


Hi.

This seems to be more robust, but the interface became more strange.
I'm not sure what to do with it. Some ideas I had to avoid introducing 
this parameter. Not sure I like any of them.


1) You can use QualifiedNameGetCreationNamespace() for aggpartialfnName 
and still compare namespace and function name  for it and  aggName, 
aggNamespace.

Seems to be not ideal, but avoids introducing new parameters.

2) You can lookup for partial aggregate function after ProcedureCreate() 
in AggregateCreate(), if it wasn't found at earlier stages. If it is the 
aggregate itself - check it. If it's still not found, error out. Also 
seems to be a bit ugly - you leave uncommitted garbage for vacuum in 
catalogue.



Another issue - the patch misses recording dependency between 
aggpartialfn and aggregate procedure.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-05 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 06:08:

Hi Mr.Pyhalov.

Thank you for your always thoughtful review.


From: Alexander Pyhalov 
Sent: Monday, June 5, 2023 6:00 PM
Have found one issue -

src/backend/catalog/pg_aggregate.c

585 if(strcmp(strVal(linitial(aggpartialfnName)),
aggName) == 0){
586 if(((aggTransType != INTERNALOID) &&
(finalfn != InvalidOid))
587 || ((aggTransType ==
INTERNALOID) && (finalfn != serialfn)))
588 elog(ERROR, "%s is not its own
aggpartialfunc", aggName);
589 } else {

Here string comparison of aggName and aggpartialfnName looks very
suspicios - it seems you should compare oids, not names (in this case,
likely oids of transition function and partial aggregation function).
The fact that aggregate name matches partial aggregation function name
is not a enough to make any decisions.


I see no problem with this string comparison. Here is the reason.
The intent of this code is, to determine whether the user specifies
the new aggregate function whose aggpartialfunc is itself.
For two aggregate functions,
If the argument list and function name match, then the two aggregate
functions must match.
By definition of aggpartialfunc,
every aggregate function and its aggpartialfn must have the same 
argument list.

Thus, if aggpartialfnName and aggName are equal as strings,
I think it is correct to determine that the user is specifying
the new aggregate function whose aggpartialfunc is itself.

However, since the document does not state these intentions
I think your suspicions are valid.
Therefore, I have added a specification to the document reflecting the
above intentions.



Hi. Let me explain.

Look at this example, taken from test.

CREATE AGGREGATE udf_avg_p_int4(int4) (
   sfunc = int4_avg_accum,
   stype = _int8,
   combinefunc = int4_avg_combine,
   initcond = '{0,0}'
);
CREATE AGGREGATE udf_sum(int4) (
   sfunc = int4_avg_accum,
   stype = _int8,
   finalfunc = int8_avg,
   combinefunc = int4_avg_combine,
   initcond = '{0,0}',
   aggpartialfunc = udf_avg_p_int4
);

Now, let's create another aggregate.

# create schema test ;
create aggregate test.udf_avg_p_int4(int4) (
   sfunc = int4_avg_accum,
   stype = _int8,
   finalfunc = int8_avg,
   combinefunc = int4_avg_combine,
   initcond = '{0,0}',
   aggpartialfunc = udf_avg_p_int4
);
ERROR:  udf_avg_p_int4 is not its own aggpartialfunc

What's the difference between test.udf_avg_p_int4(int4) aggregate and 
udf_sum(int4)? They are essentially the same, but second one can't be 
defined.


Also note difference:

# CREATE AGGREGATE udf_sum(int4) (
   sfunc = int4_avg_accum,
   stype = _int8,
   finalfunc = int8_avg,
   combinefunc = pg_catalog.int4_avg_combine,
   initcond = '{0,0}',
   aggpartialfunc = udf_avg_p_int4
);
CREATE AGGREGATE

# CREATE AGGREGATE udf_sum(int4) (
   sfunc = int4_avg_accum,
   stype = _int8,
   finalfunc = int8_avg,
   combinefunc = pg_catalog.int4_avg_combine,
   initcond = '{0,0}',
   aggpartialfunc = public.udf_avg_p_int4
);
ERROR:  aggpartialfnName is invalid

It seems that assumption about aggpartialfnName - that it's a 
non-qualified name is incorrect. And if we use qualified names, we can't 
compare just names, likely we should compare oids.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-05 Thread Alexander Pyhalov


Bruce Momjian писал 2023-06-05 19:26:

On Mon, Jun  5, 2023 at 12:00:27PM +0300, Alexander Pyhalov wrote:
Note that after these changes "select sum()" will fail for certain 
cases,

when remote server version is not the latest. In other cases we tried
to preserve compatibility. Should we have a switch for a foreign 
server to

turn this optimization off? Or do we just state that users
should use different workarounds if remote server version doesn't 
match

local one?


We covered this in April in this and previous emails:

https://www.postgresql.org/message-id/ZDGTza4rovCa%2BN3d%40momjian.us

We don't check the version number for _any_ builtin functions so why
would we need to check for aggregate pushdown?  Yes, these will be new
functions in PG 17, we have added functions regularly in major releases
and have never heard reports of problems about that.


Hi.

I've seen this message. But introduction of new built-in function will 
break requests to old servers
only if this new function is used in the request (this means that query 
changes). However, this patch
changes the behavior of old queries, which worked prior to update. This 
seems to be different to me.
Also I see that in connection.c (configure_remote_session()), we care 
about old PostgreSQL versions.
And now we make querying them more tricky. Is it consistent? Do you 
think that
enable_partitionwise_aggregate is a good enough protection in this 
cases?


In documentation I see


"F.38.7. Cross-Version Compatibility
postgres_fdw can be used with remote servers dating back to PostgreSQL 
8.3. Read-only capability is available back to 8.1. A limitation however 
is that postgres_fdw generally assumes that immutable built-in functions 
and operators are safe to send to the remote server for execution, if 
they appear in a WHERE clause for a foreign table. Thus, a built-in 
function that was added since the remote server's release might be sent 
to it for execution, resulting in “function does not exist” or a similar 
error. This type of failure can be worked around by rewriting the query, 
for example by embedding the foreign table reference in a sub-SELECT 
with OFFSET 0 as an optimization fence, and placing the problematic 
function or operator outside the sub-SELECT."


Likely, this paragraph should be expanded to state that partition-wise 
aggregation for many functions can fail to work with old foreign 
servers.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2023-06-05 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-02 06:54:

Hi Mr.Bruce, hackers.

I updated the patch.
The following is a list of comments received on the previous version
of the patch
and my update to them in this version of the patch.



Hi.

I've looked through the last version of the patch.

Have found one issue -

src/backend/catalog/pg_aggregate.c

585 if(strcmp(strVal(linitial(aggpartialfnName)), 
aggName) == 0){
586 if(((aggTransType != INTERNALOID) && 
(finalfn != InvalidOid))
587 || ((aggTransType == 
INTERNALOID) && (finalfn != serialfn)))
588 elog(ERROR, "%s is not its own 
aggpartialfunc", aggName);

589 } else {

Here string comparison of aggName and aggpartialfnName looks very 
suspicios - it seems you should compare oids, not names (in this case,
likely oids of transition function and partial aggregation function). 
The fact that aggregate name matches partial aggregation function name

is not a enough to make any decisions.


In documentation

doc/src/sgml/postgres-fdw.sgml:

 930postgres_fdw attempts to optimize remote 
queries to reduce
 931the amount of data transferred from foreign servers.  This is 
done by
 932sending query WHERE clauses and ggregate 
expressions
 933to the remote server for execution, and by not retrieving table 
columns that

 934are not needed for the current query.
 935To reduce the risk of misexecution of queries,
 936WHERE clauses and ggregate expressions are 
not sent to
 937the remote server unless they use only data types, operators, 
and functions
 938that are built-in or belong to an extension that's listed in the 
foreign

 939server's extensions option.
 940Operators and functions in such clauses must
 941be IMMUTABLE as well.

there are misprints in lines 932 and 936 - missing "a" in "aggregate" 
expressions.


Note that after these changes "select sum()" will fail for certain 
cases, when remote server version is not the latest. In other cases we 
tried
to preserve compatibility. Should we have a switch for a foreign server 
to turn this optimization off? Or do we just state that users
should use different workarounds if remote server version doesn't match 
local one?


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: buffer refcount leak in foreign batch insert code

2023-05-02 Thread Alexander Pyhalov


Michael Paquier писал 2023-04-25 04:30:

On Mon, Apr 24, 2023 at 09:57:10AM +0900, Michael Paquier wrote:

The attached is what I am finishing with, with a minimal regression
test added to postgres_fdw.  Two partitions are enough.


Well, I have gone through that again this morning, and applied the fix
down to 14.  The buildfarm is digesting it fine.
--
Michael


Thank you. Sorry for the late response, was on vacation.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

buffer refcount leak in foreign batch insert code

2023-04-21 Thread Alexander Pyhalov


Hi.

We've found that in cases like the one attached, when we insert into 
foreign partition with batch_size set, buffer refcount leak is detected.


The above example we see a dozen of similar messages:

repro_small.sql:31: WARNING:  buffer refcount leak: [14621] 
(rel=base/16718/16732, blockNum=54, flags=0x9380


The issue was introduced in the following commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra 
Date:   Fri Jun 11 20:19:48 2021 +0200

Optimize creation of slots for FDW bulk inserts

In this commit we avoid recreating slots for each batch. But it seems 
that created slots should still be cleared in the end of 
ExecBatchInsert().


At least the attached patch seems to fix the issue.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalCREATE EXTENSION postgres_fdw;
DO $d$
BEGIN
EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname '$$||current_database()||$$',
 port '$$||current_setting('port')||$$'
)$$;
END
$d$;
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

CREATE TABLE local_data (id int, data text);
INSERT INTO local_data SELECT i, 'test'|| i FROM generate_series(1, 1) i;

CREATE TABLE sharded_data (id int, data text) PARTITION BY HASH(id);
CREATE TABLE sharded_data_p0  PARTITION OF sharded_data FOR VALUES WITH 
(modulus 4, remainder 0);
CREATE TABLE sharded_data_p1_remote (id int, data text);
CREATE FOREIGN TABLE sharded_data_p1  PARTITION OF sharded_data FOR VALUES WITH 
(modulus 4, remainder 1)
SERVER loopback OPTIONS (table_name 'sharded_data_p1_remote');
CREATE TABLE sharded_data_p2_remote (id int, data text);
CREATE FOREIGN TABLE sharded_data_p2  PARTITION OF sharded_data FOR VALUES WITH 
(modulus 4, remainder 2)
SERVER loopback OPTIONS (table_name 'sharded_data_p2_remote');
CREATE TABLE sharded_data_p3_remote (id int, data text);
CREATE FOREIGN TABLE sharded_data_p3  PARTITION OF sharded_data FOR VALUES WITH 
(modulus 4, remainder 3)
SERVER loopback OPTIONS (table_name 'sharded_data_p3_remote');

insert into sharded_data select * from local_data ;
delete from sharded_data;

alter server loopback options (add batch_size '100');
insert into sharded_data select * from local_data ;
From 12d1fbf56c1c82d4659da484c2207539f396aac9 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 21 Apr 2023 12:54:41 +0300
Subject: [PATCH] Fix buffer refcount leak

The buffer refcount leak was introduced
in b676ac443b6a83558d4701b2dd9491c0b37e17c4
---
 src/backend/executor/nodeModifyTable.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6aa8c03defb..14ab8382183 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1261,6 +1261,12 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecClearTuple(slots[i]);
+		ExecClearTuple(planSlots[i]);
+	}
 }
 
 /*
-- 
2.34.1

Re: CREATE INDEX CONCURRENTLY on partitioned index

2023-03-27 Thread Alexander Pyhalov


Justin Pryzby писал 2023-03-26 17:51:

On Sun, Dec 04, 2022 at 01:09:35PM -0600, Justin Pryzby wrote:

This currently handles partitions with a loop around the whole CIC
implementation, which means that things like WaitForLockers() happen
once for each index, the same as REINDEX CONCURRENTLY on a partitioned
table.  Contrast that with ReindexRelationConcurrently(), which 
handles
all the indexes on a table in one pass by looping around indexes 
within

each phase.


Rebased over the progress reporting fix (27f5c712b).

I added a list of (intermediate) partitioned tables, rather than 
looping

over the list of inheritors again, to save calling rel_get_relkind().

I think this patch is done.


Hi.

Overall looks good to me. However, I think that using 'partitioned' as 
list of partitioned index oids in DefineIndex() is a bit misleading - 
we've just used it as boolean, specifying if we are dealing with a 
partitioned relation.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2023-01-20 Thread Alexander Pyhalov


Hi.

Tomas Vondra писал 2023-01-19 20:49:
I took a quick look at the patch. It needs a rebase, although it 
applies

fine using patch.

A couple minor comments:

1) addl_conds seems a bit hard to understand, I'd use either the full
wording (additional_conds) or maybe extra_conds


Renamed to additional_conds.



2) some of the lines got quite long, and need a wrap

Splitted some of them. Not sure if it's enough.



3) unknown_subquery_rels name is a bit misleading - AFAIK it's the rels
that can't be referenced from upper rels (per what the .h says). So 
they

are known, but hidden. Is there a better name?


Renamed to hidden_subquery_rels. These are rels, which can't be referred 
to from upper join levels.




4) joinrel_target_ok() needs a better comment, explaining *when* the
reltarget is safe for pushdown. The conditions are on the same row, but
the project style is to break after '&&'.


Added comment. It seems to be a rephrasing of lower comment in 
joinrel_target_ok().


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f37d26d9b622767f94e89034fa8e4fccc69e358d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Nov 2022 10:23:32 +0300
Subject: [PATCH v4] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 206 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  89 +-
 contrib/postgres_fdw/postgres_fdw.h   |   2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 632 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 473fa45bd43..1217d47050b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -180,11 +180,14 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
   Index ignore_rel, List **ignore_conds,
+  StringInfo additional_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds,
+			   StringInfo additional_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1370,23 +1373,21 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	StringInfoData additional_conds;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
 		   IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel));
 
+	initStringInfo(_conds);
 	/* Construct FROM clause */
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, _conds,
+		  context->params_list);
+	appendWhereClause(quals, _conds, context);
+	pfree(additional_conds.data);
 }
 
 /*
@@ -1598,6 +1599,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and additional_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+
+	if (exprs != NIL || additional_conds->len > 0)
+		appendStringInfoString(buf, " WHERE ");
+
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+		need_and = true;
+	}
+
+	if (additional_conds->len > 0)
+	{
+		if (need_and)
+			appendStringInfoString(buf, " AND ");
+		appendStringInfo(buf, "(%s)", additional_conds->data);
+	}
+}
+
 /* Output join name for given join type */
 const char *
 get_jointype_name(JoinType jointype)
@@ -1616,6 +1644,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";

Re: Inconsistency in vacuum behavior

2023-01-18 Thread Alexander Pyhalov


Justin Pryzby писал 2023-01-19 04:49:

On Mon, Jan 16, 2023 at 08:12:18PM +0300, Nikita Malakhov wrote:

Hi,

Currently there is no error in this case, so additional thrown error 
would

require a new test.
Besides, throwing an error here does not make sense - it is just a 
check

for a vacuum
permission, I think the right way is to just skip a relation that is 
not

suitable for vacuum.
Any thoughts or objections?


Could you check if this is consistent between the behavior of VACUUM
FULL and CLUSTER ?  See also Nathan's patches.


Hi.

Cluster behaves in a different way - it errors out immediately if 
relation is not owned by user. For partitioned rel it would anyway raise 
error later.
VACUUM and VACUUM FULL behave consistently after applying Nikita's patch 
(for partitioned and regular tables) - issue warning "skipping 
TABLE_NAME --- only table or database owner can vacuum it" and return 
success status.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Alexander Pyhalov


Nikita Malakhov писал 2023-01-16 20:12:

Hi,

Currently there is no error in this case, so additional thrown error
would require a new test.
Besides, throwing an error here does not make sense - it is just a
check for a vacuum
permission, I think the right way is to just skip a relation that is
not suitable for vacuum.
Any thoughts or objections?



No objections for not throwing an error.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Alexander Pyhalov


Nikita Malakhov писал 2023-01-16 17:26:

Hi!

Here's the patch that fixes this case, please check it out.
The patch adds vacuum_is_permitted_for_relation() check before adding
partition relation to the vacuum list, and if permission is denied the
relation
is not added, so it is not passed to vacuum_rel() and there are no try
to
acquire the lock.

Cheers!


Hi.

The patch seems to solve the issue.
Two minor questions I have:
1) should we error out if HeapTupleIsValid(part_tuple) is false?
2) comment "Check partition relations for vacuum permit" seems to be 
broken in some way.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Inconsistency in vacuum behavior

2023-01-16 Thread Alexander Pyhalov


Hi.

We've run regress isolation tests on partitioned tables and found 
interesting VACUUM behavior. I'm not sure, if it's intended.


In the following example, partitioned tables and regular tables behave 
differently:


CREATE TABLE vacuum_tab (a int) PARTITION BY HASH (a);
CREATE TABLE vacuum_tab_1 PARTITION OF vacuum_tab FOR VALUES WITH 
(MODULUS 2, REMAINDER 0);
CREATE TABLE vacuum_tab_2 PARTITION OF vacuum_tab FOR VALUES WITH 
(MODULUS 2, REMAINDER 1);

CREATE ROLE regress_vacuum_conflict;

In the first session:

begin;
 LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;

In the second:
SET ROLE regress_vacuum_conflict;
 VACUUM vacuum_tab;
 WARNING:  permission denied to vacuum "vacuum_tab", skipping it < 
hangs here, trying to lock vacuum_tab_1


In non-partitioned case second session exits after emitting warning. In 
partitioned case, it hangs, trying to get locks.
This is due to the fact that in expand_vacuum_rel() we skip parent table 
if vacuum_is_permitted_for_relation(), but don't perform such check for 
its child.
The check will be performed later in vacuum_rel(), but after 
vacuum_open_relation(), which leads to hang in the second session.


Is it intended? Why don't we perform vacuum_is_permitted_for_relation() 
check for inheritors in expand_vacuum_rel()?


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2022-12-06 Thread Alexander Pyhalov


Hi, Yuki.

Thanks for looking at this patch.

fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-03 06:02:


question1)
  > + if (jointype == JOIN_SEMI && bms_is_member(var->varno,
innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
  It takes time for me to find in what case this condition is true.
  There is cases in which this condition is true for semi-join of two 
baserels

  when running query which joins more than two relations such as
query2 and query3.
  Running queries such as query2, you maybe want to pushdown of only
semi-join path of
  joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1)
and baserel(innerrel) f_t3
  because of safety deparse. So you add this condition.
  Becouase of this limitation, your patch can't push down subquery 
expression

  "exists (select null from f_t2 where c1 = a1.c1)" in query3.
  I think, it is one of difficulty points for semi-join pushdown.
  This is my understanding of the intent of this condition and the
restrictions imposed by this condition.
  Is my understanding right?


IIRC, planner can create semi-join, which targetlist references Vars 
from inner join relation. However, it's deparsed as exists and so we 
can't reference it from SQL. So, there's this check - if Var is 
referenced in semi-join target list, it can't be pushed down.

You can see this if comment out this check.

EXPLAIN (verbose, costs off)
SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
(SELECT * FROM ft4 WHERE EXISTS (
SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
ON ft2.c2 = ft4.c1
INNER JOIN
(SELECT * FROM ft2 WHERE EXISTS (
SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
ON ft2.c2 = ft21.c2
WHERE ft2.c1 > 900
ORDER BY ft2.c1 LIMIT 10;

will fail with
EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT 
NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2


Here you can see that
SELECT * FROM ft2 WHERE EXISTS (
SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)

was transformed to
SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM 
"S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2


where our exists subquery is referenced from tlist. It's fine for plan 
(relations, participating in semi-join, can be referenced in tlist),

but is not going to work with EXISTS subquery.
BTW, there's a comment in joinrel_target_ok(). It tells exactly that -

5535 if (jointype == JOIN_SEMI && bms_is_member(var->varno, 
innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))

5536 {
5537 /* We deparse semi-join as exists() subquery, and 
so can't deparse references to inner rel in join target list. */

5538 ok = false;
5539 break;
5540 }

Expanded comment.


question2) In foreign_join_ok
  > * Constructing queries representing ANTI joins is hard, hence
  Is this true? Is it hard to expand your approach to ANTI join 
pushdown?


I haven't tried, so don't know.

question3) You use variables whose name is "addl_condXXX" in the 
following code.

  > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s",
join_sql_i.data);
  Does this naming mean additional literal?
  Is there more complehensive naming, such as "subquery_exprXXX"?


The naming means additional conditions (for WHERE clause, by analogy 
with ignore_conds and remote_conds). Not sure if subquery_expr sounds 
better, but if you come with better idea, I'm fine with renaming them.


question4) Although really detail, there is expression making space 
such as

  "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
  Is there reason for this difference? If not, need we use same policy
for making space?



Fixed.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 0f26b0841cb095b4e114984deac2b1b001368c15 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Nov 2022 10:23:32 +0300
Subject: [PATCH v3] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 201 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  82 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 620 insertions(+), 82 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 95247656504..10d82d9f2ab 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -179,12 +179,13 @@ static void appendLimitClause(deparse_ex

Re: Partial aggregates pushdown

2022-12-01 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-01 05:23:

Hi Mr.Pyhalov.


Hi.

Attaching minor fixes. I haven't proof-read all comments (but perhaps, 
they need attention from some native speaker).


Tested it with queries from 
https://github.com/swarm64/s64da-benchmark-toolkit, works as expected.

--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 35f2d102374..bd8a4acc112 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -3472,9 +3472,9 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 		if ((aggform->aggtranstype != INTERNALOID) && (aggform->aggfinalfn == InvalidOid)) {
 			appendFunctionName(node->aggfnoid, context);
 		} else if(aggform->partialaggfn) {
-			appendFunctionName((Oid)(aggform->partialaggfn), context);
+			appendFunctionName(aggform->partialaggfn, context);
 		} else {
-			elog(ERROR, "there in no partialaggfn %u", node->aggfnoid);
+			elog(ERROR, "there is no partialaggfn %u", node->aggfnoid);
 		}
 		ReleaseSysCache(aggtup);
 	}
@@ -3986,7 +3986,8 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 }
 
 /*
- * Check that partial aggregate function of aggform exsits in remote
+ * Check that partial aggregate function, described by aggform,
+ * exists on remote server, described by fpinfo.
  */
 static bool
 partial_agg_compatible(Form_pg_aggregate aggform, PgFdwRelationInfo *fpinfo)

Re: Partial aggregates pushdown

2022-11-30 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01:


2) Do we really have to look at pg_proc in partial_agg_ok() and
deparseAggref()? Perhaps, looking at aggtranstype is enough?

You are right. I fixed according to your comment.



partial_agg_ok() still looks at pg_proc.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2022-11-30 Thread Alexander Pyhalov


fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01:

Hi Mr.Pyhalov.

1) In previous version of the patch aggregates, which had 
partialaggfn, were ok
to push down. And it was a definite sign that aggregate can be pushed 
down.
Now we allow pushing down an aggregate, which prorettype is not 
internal and
aggfinalfn is not defined. Is it safe for all user-defined (or 
builtin) aggregates,
even if they are generally shippable? Aggcombinefn is executed locally 
and we
check that aggregate function itself is shippable. Is it enough? 
Perhaps, we
could use partialagg_minversion (like aggregates with 
partialagg_minversion

== -1 should not be pushed down) or introduce separate explicit flag?

In what case partial aggregate pushdown is unsafe for aggregate which
has not internal aggtranstype
 and has no aggfinalfn?
By reading [1], I believe that if aggcombinefn of such aggregate
recieves return values of original
 aggregate functions in each remote then it must produce same value
that would have resulted
 from scanning all the input in a single operation.



One more issue I started to think about - now we don't check 
partialagg_minversion for "simple" aggregates at all. Is it correct? It 
seems that , for example, we could try to pushdown bit_or(int8) to old 
servers, but it didn't exist, for example, in 8.4.  I think it's a 
broader issue (it would be also the case already if we push down 
aggregates) and shouldn't be fixed here. But there is an issue - 
is_shippable() is too optimistic.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2022-11-30 Thread Alexander Pyhalov


Hi, Yuki.

1) In previous version of the patch aggregates, which had partialaggfn, 
were ok to push down. And it was a definite sign that aggregate can be 
pushed down. Now we allow pushing down an aggregate, which prorettype is 
not internal and aggfinalfn is not defined. Is it safe for all 
user-defined (or builtin) aggregates, even if they are generally 
shippable? Aggcombinefn is executed locally and we check that aggregate 
function itself is shippable. Is it enough? Perhaps, we could use 
partialagg_minversion (like aggregates with partialagg_minversion == -1 
should not be pushed down) or introduce separate explicit flag?


2) Do we really have to look at pg_proc in partial_agg_ok() and 
deparseAggref()? Perhaps, looking at aggtranstype is enough?


3) I'm not sure if CREATE AGGREGATE tests with invalid 
PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw tests or 
better should be moved to src/test/regress/sql/create_aggregate.sql, as 
they are not specific to postgres_fdw


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2022-11-22 Thread Alexander Pyhalov

 qc=0x7ffd1a4053c0) at utility.c:1074



Later will look at it again.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-11-21 Thread Alexander Pyhalov


Justin Pryzby писал 2022-11-21 06:00:

I finally found time to digest and integrate your changes into my local
branch.  This fixes the three issues you reported: FORCE_RELEASE, issue
with INVALID partitions issue (for which I adapted your patch into an
earlier patch in my series), and progress reporting.  And rebased.


Hi.

Thank you for the effort.
I've looked through and tested new patch a bit. Overall it looks good to 
me.
The question I have is whether we should update 
pg_stat_progress_create_index in reindex_invalid_child_indexes(), when 
we skip valid indexes?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Add semi-join pushdown to postgres_fdw

2022-11-06 Thread Alexander Pyhalov


Ian Lawrence Barwick писал 2022-11-04 02:21:


This entry was marked as "Needs review" in the CommitFest app but cfbot
reports the patch no longer applies.

We've marked it as "Waiting on Author". As CommitFest 2022-11 is
currently underway, this would be an excellent time update the patch.

Once you think the patchset is ready for review again, you (or any
interested party) can  move the patch entry forward by visiting

https://commitfest.postgresql.org/40/3838/

and changing the status to "Needs review".



Hi. I've rebased the patch.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 1ac2a9e3611f716da688c04a4ec36888f62078ce Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Nov 2022 10:23:32 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 198 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  78 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 613 insertions(+), 82 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 95247656504..45885442418 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
-  Index ignore_rel, List **ignore_conds,
+  Index ignore_rel, List **ignore_conds, StringInfo addl_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1370,23 +1371,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	StringInfoData addl_conds;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
 		   IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel));
 
+	initStringInfo(_conds);
 	/* Construct FROM clause */
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, _conds, context->params_list);
+	appendWhereClause(quals, _conds, context);
+	pfree(addl_conds.data);
 }
 
 /*
@@ -1598,6 +1596,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and addl_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+
+	if (exprs != NIL || addl_conds->len > 0)
+		appendStringInfoString(buf, " WHERE ");
+
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+		need_and = true;
+	}
+
+	if (addl_conds->len > 0)
+	{
+		if (need_and)
+			appendStringInfoString(buf, " AND ");
+		appendStringInfo(buf, "(%s)", addl_conds->data);
+	}
+}
+
 /* Output join name for given join type */
 const char *
 get_jointype_name(JoinType jointype)
@@ -1616,6 +1641,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";
 
+		case JOIN_SEMI:
+			return "SEMI";
+
 		default:
 			/* Shouldn't come here, but protect from buggy code. */
 			elog(ERROR, "unsupported join type %d", jointype);
@@ -1715,7 +1743,7 @@ deparseSubqueryTargetList(deparse_expr_cxt *context)
  */
 static void
 deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
-	  bool use_alias, Index ignore_rel, List

Re: Add semi-join pushdown to postgres_fdw

2022-08-30 Thread Alexander Pyhalov


Ashutosh Bapat писал 2022-08-29 17:12:

Hi Alexander,
Thanks for working on this. It's great to see FDW join pushdown scope
being expanded to more complex cases.

I am still figuring out the implementation. It's been a while I have
looked at join push down code.

But following change strikes me odd
 -- subquery using immutable function (can be sent to remote)
 PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3
IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) =
'1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
-  QUERY PLAN

- Sort
+

QUERY PLAN
+---
+ Foreign Scan
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-   Sort Key: t1.c1
-   ->  Nested Loop Semi Join
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, 
t1.c8

- Join Filter: (t1.c3 = t2.c3)
- ->  Foreign Scan on public.ft1 t1
-   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, 
t1.c7, t1.c8

-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
- ->  Materialize
-   Output: t2.c3
-   ->  Foreign Scan on public.ft2 t2
- Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE
(("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
-(14 rows)
+   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND (EXISTS
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r1.c3 = r3.c3 ORDER BY
r1."C 1" ASC NULLS LAST
+(4 rows)

date_in  | s   |1 | [0:0]={cstring}
date_in which will be used to cast a test to date is not immutable. So
the query should't be pushed down. May not be a problem with your
patch. Can you please check?


Hi.

It is not related to my change and works as expected. As I see, we have 
expression FuncExprdate(oid = 2029, args=Var ) = Const(type date)

(date(r3.c5) = '1970-01-17'::date).
Function is

# select proname, provolatile from pg_proc where oid=2029;
 proname | provolatile
-+-
 date| i

So it's shippable.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Add semi-join pushdown to postgres_fdw

2022-08-24 Thread Alexander Pyhalov


Hi.

It's possible to extend deparsing in postgres_fdw, so that we can push 
down semi-joins, which doesn't refer to inner reltarget. This allows

us to push down joins in queries like

SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND t1.c3 IN (SELECT c3 FROM ft2 
t2 WHERE date(c5) = '1970-01-17'::date);



EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND 
t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE date(c5) = '1970-01-17'::date);
 
QUERY PLAN

---
Foreign Scan
   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, 
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 10)) AND (EXISTS 
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((date(r3.c5) = 
'1970-01-17'::date)) AND ((r1.c3 = r3.c3


Deparsing semi-joins leads to generating (text) conditions like 'EXISTS 
(SELECT NULL FROM  inner_rel WHERE join_conds) . Such conditions are 
generated in deparseFromExprForRel() and distributed to nearest WHERE, 
where they are added to the list of and clauses.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 7833d67f69287648c4594a5508feed376427f95d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 12 Aug 2022 15:02:24 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 198 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  78 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 613 insertions(+), 82 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a9766f97346..fcdc679d51f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
-  Index ignore_rel, List **ignore_conds,
+  Index ignore_rel, List **ignore_conds, StringInfo addl_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1372,23 +1373,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	StringInfoData addl_conds;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
 		   IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel));
 
+	initStringInfo(_conds);
 	/* Construct FROM clause */
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, _conds, context->params_list);
+	appendWhereClause(quals, _conds, context);
+	pfree(addl_conds.data);
 }
 
 /*
@@ -1600,6 +1598,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and addl_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+
+	if (exprs != NIL || addl_conds->len >

Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-06-29 Thread Alexander Pyhalov


Justin Pryzby писал 2022-06-28 21:33:

Hi,

On Thu, Feb 10, 2022 at 06:07:08PM +0300, Alexander Pyhalov wrote:
I've rebased patches and tried to fix issues I've seen. I've fixed 
reference
after table_close() in the first patch (can be seen while building 
with

CPPFLAGS='-DRELCACHE_FORCE_RELEASE').




Rebased patches on the current master.
They still require proper review.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 5c11849ceb2a1feb0e44dbdf30cc27de0282a659 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 28 Feb 2022 10:50:58 +0300
Subject: [PATCH 5/5] Mark intermediate partitioned indexes as valid

---
 src/backend/commands/indexcmds.c   | 33 ++-
 src/test/regress/expected/indexing.out | 80 +-
 src/test/regress/sql/indexing.sql  |  8 +++
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d09f0390413..d3ced6265b6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3139,6 +3139,7 @@ static void
 ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel)
 {
 	List	   *partitions = NIL;
+	List	   *inhpartindexes = NIL;
 	char		relkind = get_rel_relkind(relid);
 	char	   *relname = get_rel_name(relid);
 	char	   *relnamespace = get_namespace_name(get_rel_namespace(relid));
@@ -3193,6 +3194,17 @@ ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel)
 		char		partkind = get_rel_relkind(partoid);
 		MemoryContext old_context;
 
+		/* Create a list of invalid inherited partitioned indexes */
+		if (partkind == RELKIND_PARTITIONED_INDEX)
+		{
+			if (partoid == relid || get_index_isvalid(partoid))
+continue;
+
+			old_context = MemoryContextSwitchTo(reindex_context);
+			inhpartindexes = lappend_oid(inhpartindexes, partoid);
+			MemoryContextSwitchTo(old_context);
+		}
+
 		/*
 		 * This discards partitioned tables, partitioned indexes and foreign
 		 * tables.
@@ -3237,9 +3249,28 @@ ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel)
 		Oid	tableoid = IndexGetRelation(relid, false);
 		List	*child_tables = find_all_inheritors(tableoid, ShareLock, NULL);
 
-		/* Both lists include their parent relation as well as any intermediate partitioned rels */
+		/*
+		 * Both lists include their parent relation as well as any
+		 * intermediate partitioned rels
+		 */
 		if (list_length(inhoids) == list_length(child_tables))
+		{
 			index_set_state_flags(relid, INDEX_CREATE_SET_VALID);
+
+			/* Mark any intermediate partitioned index as valid */
+			foreach(lc, inhpartindexes)
+			{
+Oid partoid = lfirst_oid(lc);
+
+Assert(get_rel_relkind(partoid) == RELKIND_PARTITIONED_INDEX);
+Assert(!get_index_isvalid(partoid));
+
+/* Can't mark an index valid without marking it ready */
+index_set_state_flags(partoid, INDEX_CREATE_SET_READY);
+CommandCounterIncrement();
+index_set_state_flags(partoid, INDEX_CREATE_SET_VALID);
+			}
+		}
 	}
 
 	/*
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index a4ccae50de3..b4f1aea6fca 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -57,6 +57,8 @@ create table idxpart11 partition of idxpart1 for values from (0) to (10) partiti
 create table idxpart111 partition of idxpart11 default partition by range(a);
 create table idxpart partition of idxpart111 default partition by range(a);
 create table idxpart2 partition of idxpart for values from (10) to (20);
+create table idxpart3 partition of idxpart for values from (30) to (40) partition by range(a);
+create table idxpart31 partition of idxpart3 default;
 insert into idxpart2 values(10),(10); -- not unique
 create index concurrently on idxpart (a); -- partitioned
 create index concurrently on idxpart1 (a); -- partitioned and partition
@@ -76,7 +78,7 @@ Partition key: RANGE (a)
 Indexes:
 "idxpart_a_idx" btree (a)
 "idxpart_a_idx1" UNIQUE, btree (a) INVALID
-Number of partitions: 2 (Use \d+ to list them.)
+Number of partitions: 3 (Use \d+ to list them.)
 
 \d idxpart1
 Partitioned table "public.idxpart1"
@@ -88,11 +90,59 @@ Number of partitions: 2 (Use \d+ to list them.)
 Partition of: idxpart FOR VALUES FROM (0) TO (10)
 Partition key: RANGE (a)
 Indexes:
-"idxpart1_a_idx" btree (a) INVALID
+"idxpart1_a_idx" btree (a)
 "idxpart1_a_idx1" btree (a)
 "idxpart1_a_idx2" UNIQUE, btree (a) INVALID
 Number of partitions: 1 (Use \d+ to list them.)
 
+\d idxpart11
+   Partitioned table "public.idxpart11"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+ b  | integer |   |  | 
+ c  | text|   |  | 
+Partition of: idxpart1 FOR V

Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-06-29 Thread Alexander Pyhalov


Justin Pryzby писал 2022-06-28 21:33:

Hi,

On Thu, Feb 10, 2022 at 06:07:08PM +0300, Alexander Pyhalov wrote:
I've rebased patches and tried to fix issues I've seen. I've fixed 
reference
after table_close() in the first patch (can be seen while building 
with

CPPFLAGS='-DRELCACHE_FORCE_RELEASE').


Thanks for finding that.

The patches other than 0001 are more experimental, and need someone to 
check if
it's even a good approach to use, so I kept them separate from the 
essential

patch.

Your latest 0005 patch (mark intermediate partitioned indexes as valid) 
is
probably fixing a bug in my SKIPVALID patch, right ?  I'm not sure 
whether the
SKIPVALID patch should be merged into 0001, and I've been awaiting 
feedback on

the main patch before handling progress reporting.


Hi. I think it's more about fixing ReindexPartitions-to-set-indisvalid 
patch, as

we also should mark intermediate indexes as valid when reindex succeeds.


Sorry for not responding sooner.  The patch saw no activity for ~11 
months so I
wasn't prepared to pick it up in March, at least not without guidance 
from a

committer.

Would you want to take over this patch ?  I wrote it following 
someone's
question, but don't expect that I'd use the feature myself.  I can help 
review
it or try to clarify the organization of my existing patches (but still 
haven't

managed to work my way through your amendments to my patches).



Yes, I'm glad to work on the patches, as this for us this is a very 
important feature.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2022-03-22 Thread Alexander Pyhalov


Tomas Vondra писал 2022-03-22 15:28:

On 3/22/22 01:49, Andres Freund wrote:

On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote:

Alexander Pyhalov писал 2022-01-17 15:26:

Updated patch.


Sorry, missed attachment.


Needs another update: http://cfbot.cputube.org/patch_37_3369.log

Marked as waiting on author.



TBH I'm still not convinced this is the right approach. I've voiced 
this

opinion before, but to reiterate the main arguments:

1) It's not clear to me how could this get extended to aggregates with
more complex aggregate states, to support e.g. avg() and similar fairly
common aggregates.


Hi.
Yes, I'm also not sure how to proceed with aggregates with complex 
state.
Likely it needs separate function to export their state, but then we 
should
somehow ensure that this function exists and our 'importer' can handle 
its result.
Note that for now we have no mechanics in postgres_fdw to find out 
remote server version

on planning stage.


2) I'm not sure relying on aggpartialpushdownsafe without any version
checks etc. is sufficient. I mean, how would we know the remote node 
has

the same idea of representing the aggregate state. I wonder how this
aligns with assumptions we do e.g. for functions etc.


It seems to be not a problem for me, as for now we don't care about 
remote node internal aggregate state representation.

We currently get just aggregate result from remote node. For aggregates
with 'internal' stype we call converter locally, and it converts 
external result from

aggregate return type to local node internal representation.



Aside from that, there's a couple review comments:

1) should not remove the comment in foreign_expr_walker


Fixed.



2) comment in deparseAggref is obsolete/inaccurate


Fixed.



3) comment for partial_agg_ok should probably explain when we consider
aggregate OK to be pushed down


Expanded comment.


4) I'm not sure why get_rcvd_attinmeta comment talks about "return type
bytea" and "real input type".


Expanded comment.  Tupdesc can be retrieved from 
node->ss.ss_ScanTupleSlot,
and so we expect to see bytea (as should be produced by partial 
aggregation).

But when we scan data, we get aggregate
output type (which matches converter input type), so attinmeta should
be fixed.
If we deal with aggregate which doesn't have converter, partial_agg_ok()
ensures that agg->aggfnoid return type matches agg->aggtranstype.



5) Talking about "partial" aggregates is a bit confusing, because that
suggests this is related to actual "partial aggregates". But it's not.


How should we call them? It's about pushing "Partial count()" or  
"Partial sum()" to the remote server,
why it's not related to partial aggregates? Do you mean that it's not 
about parallel aggregate processing?



6) Can add_foreign_grouping_paths do without the new 'partial'
parameter? Clearly, it can be deduced from extra->patype, no?


Fixed this.



7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in
CREATE AGGREGATE sgml docs.


Added documentation. I'd appreciate advice on how it should be extended.



8) I don't think "serialize" in the converter functions is the right
term, considering those functions are not "serializing" anything. If
anything, it's the remote node that is serializing the agg state and 
the

local not is deserializing it. Or maybe I just misunderstand where are
the converter functions executed?


Converter function transforms aggregate result to serialized internal 
representation,

which is expected from partial aggregate. I mean, it converts aggregate
result type to internal representation and then efficiently executes
serialization code (i.e. converter(x) == serialize(to_internal(x))).

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 7ad4eacf017a4fd3793f0949ef43ccc8292bf3f6 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  63 +-
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 187 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 doc/src/sgml/ref/create_aggregate.sgml|  27 +++
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 ++-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  20 +-
 src/include/catalog/pg_aggregate.dat  | 110 ++-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 702 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..272e2391d7f 10064

Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-03-01 Thread Alexander Pyhalov


Hi.

I've added 0005-Mark-intermediate-partitioned-indexes-as-valid.patch 
which fixed the following issues - when partitioned index is created, 
indexes on intermediate partitioned tables were preserved in invalid 
state. Also added some more tests.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 18fa3c27a3311294a7abfdc0674ef6143c65423b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 7 Feb 2022 10:28:42 +0300
Subject: [PATCH 1/5] Allow CREATE INDEX CONCURRENTLY on partitioned table

0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from
https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com

Fixes:
  - rel was used after table_close();
  - it seems childidxs shouldn't live in ind_context;
  - updated doc.
---
 doc/src/sgml/ref/create_index.sgml |  14 +--
 src/backend/commands/indexcmds.c   | 151 ++---
 src/test/regress/expected/indexing.out |  60 +-
 src/test/regress/sql/indexing.sql  |  18 ++-
 4 files changed, 186 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 91eaaabc90f..ffa98692430 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while creating index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -688,15 +691,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cd30f15eba6..a34a1b133a0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -670,17 +671,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.  Seems better to be
-		 * consistent, even though we could do it on temporary table because
-		 * we're not actually doing it concurrently.
-		 */
-		if (stmt->concurrent)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
-			RelationGetRelationName(rel;
 		if (stmt->excludeOpNames)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId,
 		if (pd->nparts != 0)
 			flags |= INDEX_CREATE_INVALID;
 	}
+	else if (concurrent && OidIsValid(parentIndexId))
+	{
+		/* If concurrent, initially build index partitions as "invalid" */
+		flags |= INDEX_CREATE_INVALID;
+	}
 
 	if (stmt->deferrable)
 		constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
@@ -1174,18 +1169,30 @@ DefineIndex(Oid relationId,
 		partdesc = RelationGetPartitionDesc(rel, true);
 		if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
 		{
+			/*
+			 * Need to close the relation before recursing into children, so
+			 * copy needed data into a longlived context.
+			 */
+
+			MemoryContext	ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX",
+	ALLOCSET_DEFAULT_SIZES);
+			MemoryContext	oldcontext = MemoryContextSwitchTo(ind_context);
 			int			nparts = partdesc->nparts;
 			Oid		   *part_oids = palloc(sizeof(Oid) * nparts);
 			bool		invalidate_parent = false;
 			TupleDesc	parentDesc;
 			Oid		   *opfamOids;
+			char		*relname;
 
 			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
 		 nparts);
 
 			memcpy(part_oids, partdesc->oids, sizeof(

Re: postgres_fdw and skip locked

2022-02-20 Thread Alexander Pyhalov


Ashutosh Bapat писал 2022-02-17 16:30:

On Wed, Feb 16, 2022 at 8:38 PM Alexander Pyhalov
 wrote:


Ashutosh Bapat писал 2022-02-16 16:40:
> On Mon, Feb 14, 2022 at 4:23 PM Alexander Pyhalov
>  wrote:
>>
> I see that these options will work for all kinds of relations. So no
> problem if foreign table is pointing to something other than a table.

Hi.
The issue is that we perform deparsing while planing, we haven't
connected to server yet.
Are there any ideas how to find out its version without specifying it,
for example, in server options?


Yes, you are right. I wish foreign servers, esp. postgresql, could be
more integrated and if we could defer deparsing till execution phase.
But that's out of scope for this patch.


Hi.
I've updated patch to provide server-level "deparse_wait_policy" option, 
which controls if we deparse SKIP LOCKED or SELECT FOR UPDATE (true by 
default).
This will make possible for people with old servers to revert to old 
behavior.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 5f5a0434227debd8ca7ee406e8cb1997edff14a3 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 14 Feb 2022 12:01:26 +0300
Subject: [PATCH] postgres_fdw could pass lock wait policy to foreign server

---
 contrib/postgres_fdw/deparse.c|  12 +-
 .../postgres_fdw/expected/postgres_fdw.out| 110 +-
 contrib/postgres_fdw/option.c |   4 +-
 contrib/postgres_fdw/postgres_fdw.c   |   4 +
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  18 +++
 doc/src/sgml/postgres-fdw.sgml|  16 +++
 7 files changed, 161 insertions(+), 4 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..f55afd461f0 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1429,8 +1429,7 @@ deparseLockingClause(deparse_expr_cxt *context)
  * For now, just ignore any [NO] KEY specification, since (a)
  * it's not clear what that means for a remote table that we
  * don't have complete information about, and (b) it wouldn't
- * work anyway on older remote servers.  Likewise, we don't
- * worry about NOWAIT.
+ * work anyway on older remote servers.
  */
 switch (rc->strength)
 {
@@ -1451,6 +1450,15 @@ deparseLockingClause(deparse_expr_cxt *context)
 if (bms_membership(rel->relids) == BMS_MULTIPLE &&
 	rc->strength != LCS_NONE)
 	appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
+
+/* Lock behavior */
+if (fpinfo->deparse_wait_policy && rc->strength != LCS_NONE)
+{
+	if (rc->waitPolicy == LockWaitSkip)
+		appendStringInfoString(buf, " SKIP LOCKED");
+	else if (rc->waitPolicy == LockWaitError)
+		appendStringInfoString(buf, " NOWAIT");
+}
 			}
 		}
 	}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b2e02caefe4..b33d61b8656 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -389,6 +389,35 @@ SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
  102 |  2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2  | 2  | foo
 (1 row)
 
+-- test wait policy specification
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+   QUERY PLAN
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE NOWAIT
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+ c1  | c2 |  c3   |  c4  |c5| c6 | c7 | c8  
+-++---+--+--+++-
+ 101 |  1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1  | foo
+(1 row)
+
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED;
+ QUERY PLAN  
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE SKIP LOC

Re: postgres_fdw and skip locked

2022-02-16 Thread Alexander Pyhalov


Ashutosh Bapat писал 2022-02-16 16:40:

On Mon, Feb 14, 2022 at 4:23 PM Alexander Pyhalov
 wrote:


Hi.

Now select ... for update ... [skip locked|nowait] options are not
pushed down to remote servers. I see the only reason is that we can
speak to pre-9.5 server, which doesn't understand skip locked option.
Are there any other possible issues? Should we add foreign table 
option

to control this behavior?


Should we always push these clauses if remote server's version is
newer than 9.5? There are quite a few options already. It will be good
not to add one more.

I see that these options will work for all kinds of relations. So no
problem if foreign table is pointing to something other than a table.


Hi.
The issue is that we perform deparsing while planing, we haven't 
connected to server yet.
Are there any ideas how to find out its version without specifying it, 
for example, in server options?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

postgres_fdw and skip locked

2022-02-14 Thread Alexander Pyhalov


Hi.

Now select ... for update ... [skip locked|nowait] options are not 
pushed down to remote servers. I see the only reason is that we can 
speak to pre-9.5 server, which doesn't understand skip locked option. 
Are there any other possible issues? Should we add foreign table option 
to control this behavior?


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f416cb0afdf42b8ab5375e7a3ccab6e41ebb16ab Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 14 Feb 2022 12:01:26 +0300
Subject: [PATCH] postgres_fdw could pass lock wait policy to foreign server

---
 contrib/postgres_fdw/deparse.c| 12 ++-
 .../postgres_fdw/expected/postgres_fdw.out| 81 +++
 contrib/postgres_fdw/sql/postgres_fdw.sql | 13 +++
 3 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..7e320be5e6a 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1429,8 +1429,7 @@ deparseLockingClause(deparse_expr_cxt *context)
  * For now, just ignore any [NO] KEY specification, since (a)
  * it's not clear what that means for a remote table that we
  * don't have complete information about, and (b) it wouldn't
- * work anyway on older remote servers.  Likewise, we don't
- * worry about NOWAIT.
+ * work anyway on older remote servers.
  */
 switch (rc->strength)
 {
@@ -1451,6 +1450,15 @@ deparseLockingClause(deparse_expr_cxt *context)
 if (bms_membership(rel->relids) == BMS_MULTIPLE &&
 	rc->strength != LCS_NONE)
 	appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
+
+/* Lock behavior */
+if (rc->strength != LCS_NONE)
+{
+	if (rc->waitPolicy == LockWaitSkip)
+		appendStringInfoString(buf, " SKIP LOCKED");
+	else if (rc->waitPolicy == LockWaitError)
+		appendStringInfoString(buf, " NOWAIT");
+}
 			}
 		}
 	}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b2e02caefe4..89564d9aef5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -389,6 +389,35 @@ SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
  102 |  2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2  | 2  | foo
 (1 row)
 
+-- test wait policy specification
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+   QUERY PLAN
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE NOWAIT
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE NOWAIT;
+ c1  | c2 |  c3   |  c4  |c5| c6 | c7 | c8  
+-++---+--+--+++-
+ 101 |  1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1  | 1  | foo
+(1 row)
+
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED;
+ QUERY PLAN  
+-
+ Foreign Scan on public.ft1 t1
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE SKIP LOCKED
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE SKIP LOCKED;
+ c1  | c2 |  c3   |  c4  |c5| c6 | c7 | c8  
+-++---+--+--+++-
+ 102 |  2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2  | 2  | foo
+(1 row)
+
 -- aggregate
 SELECT COUNT(*) FROM ft1 t1;
  count 
@@ -1862,6 +1891,32 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
  110 | 110
 (10 rows)
 
+-- test wait policy specification
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE SKIP LOCKED;
+

Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-02-10 Thread Alexander Pyhalov


Justin Pryzby писал 2021-02-26 21:20:

On Mon, Feb 15, 2021 at 10:07:05PM +0300, Anastasia Lubennikova wrote:
5) Speaking of documentation, I think we need to add a paragraph about 
CIC
on partitioned indexes which will explain that invalid indexes may 
appear

and what user should do to fix them.


I'm not sure about that - it's already documented in general, for
nonpartitioned indexes.


Hi.

I've rebased patches and tried to fix issues I've seen. I've fixed 
reference after table_close() in the first patch (can be seen while 
building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). It seems childidxs 
shouldn't live in ind_context, so I moved it out of it. Updated 
documentation to state that CIC can leave invalid or valid indexes on 
partitions if it's not succeeded. Also merged old 
0002-f-progress-reporting.patch and 
0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the 
first one didn't really fixed issue with progress report (as 
ReindexRelationConcurrently() uses pgstat_progress_start_command(), 
which seems to mess up the effect of this command in DefineIndex()). 
Note, that third patch completely removes attempts to report create 
index progress correctly (reindex reports about individual commands, not 
the whole CREATE INDEX).


So I've added 0003-Try-to-fix-create-index-progress-report.patch, which 
tries to fix the mess with create index progress report. It introduces 
new flag REINDEXOPT_REPORT_CREATE_PART to ReindexParams->options. Given 
this flag, ReindexRelationConcurrently() will not report about 
individual operations start/stop, but ReindexMultipleInternal() will 
report about reindexed partitions. To make the issue worse, some 
partitions can be handled in ReindexPartitions() and 
ReindexMultipleInternal() should know how many to correctly update 
PROGRESS_CREATEIDX_PARTITIONS_DONE counter. Also it needs IndexOid to 
correctly generate pg_stat_progress_create_index record, so we pass 
these parameters to it.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom eaad7c3ed2fda93fdb91aea60294f60489444bf7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 7 Feb 2022 10:28:42 +0300
Subject: [PATCH 1/4] Allow CREATE INDEX CONCURRENTLY on partitioned table

0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from
https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com

Fixes:
  - rel was used after table_close();
  - it seems childidxs shouldn't live in ind_context;
  - updated doc.
---
 doc/src/sgml/ref/create_index.sgml |  14 +--
 src/backend/commands/indexcmds.c   | 151 ++---
 src/test/regress/expected/indexing.out |  60 +-
 src/test/regress/sql/indexing.sql  |  18 ++-
 4 files changed, 186 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 91eaaabc90f..ffa98692430 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while creating index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -688,15 +691,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 560dcc87a2c..666ced8e1d7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -670,17 +671,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.

CREATE INDEX CONCURRENTLY on partitioned index

2022-02-09 Thread Alexander Pyhalov


Alexander Pyhalov писал 2022-02-09 15:18:

Hi.

I've looked at patches, introducing CREATE INDEX CONCURRENTLY for
partitioned tables -
https://www.postgresql.org/message-id/flat/20210226182019.GU20769%40telsasoft.com#da169a0a518bf8121604437d9ab053b3
.
The thread didn't have any activity for a year.

I've rebased patches and tried to fix issues I've seen. I've fixed
reference after table_close() in the first patch (can be seen while
building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Also merged old
0002-f-progress-reporting.patch and
0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the
first one didn't really fixed issue with progress report (as
ReindexRelationConcurrently() uses pgstat_progress_start_command(),
which seems to mess up the effect of this command in DefineIndex()).
Also third patch completely removes attempts to report create index
progress correctly (reindex reports about individual commands, not the
whole CREATE INDEX).

So I've added 0003-Try-to-fix-create-index-progress-report.patch,
which tries to fix the mess with create index progress report. It
introduces new flag REINDEXOPT_REPORT_PART to ReindexParams->options.
Given this flag, ReindexRelationConcurrently() will not report about
individual operations, but ReindexMultipleInternal() will report about
reindexed partitions. To make the issue worse, some partitions can be
handled in ReindexPartitions() and ReindexMultipleInternal() should
know how many to correctly update PROGRESS_CREATEIDX_PARTITIONS_DONE
counter, so we pass the number of handled partitions to it.

I also have question if in src/backend/commands/indexcmds.c:1239
1240 oldcontext = MemoryContextSwitchTo(ind_context);
1239 childidxs = RelationGetIndexList(childrel);
1241 attmap =
1242 
build_attrmap_by_name(RelationGetDescr(childrel),

1243   parentDesc);
1244 MemoryContextSwitchTo(oldcontext);

should live in ind_context, given that we iterate over this list of
oids and immediately free it, but at least it shouldn't do much harm.


Sorry, messed the topic.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Justin Pryzby

2022-02-09 Thread Alexander Pyhalov


Hi.

I've looked at patches, introducing CREATE INDEX CONCURRENTLY for 
partitioned tables - 
https://www.postgresql.org/message-id/flat/20210226182019.GU20769%40telsasoft.com#da169a0a518bf8121604437d9ab053b3 
.

The thread didn't have any activity for a year.

I've rebased patches and tried to fix issues I've seen. I've fixed 
reference after table_close() in the first patch (can be seen while 
building with CPPFLAGS='-DRELCACHE_FORCE_RELEASE'). Also merged old 
0002-f-progress-reporting.patch and 
0003-WIP-Add-SKIPVALID-flag-for-more-integration.patch. It seems the 
first one didn't really fixed issue with progress report (as 
ReindexRelationConcurrently() uses pgstat_progress_start_command(), 
which seems to mess up the effect of this command in DefineIndex()). 
Also third patch completely removes attempts to report create index 
progress correctly (reindex reports about individual commands, not the 
whole CREATE INDEX).


So I've added 0003-Try-to-fix-create-index-progress-report.patch, which 
tries to fix the mess with create index progress report. It introduces 
new flag REINDEXOPT_REPORT_PART to ReindexParams->options. Given this 
flag, ReindexRelationConcurrently() will not report about individual 
operations, but ReindexMultipleInternal() will report about reindexed 
partitions. To make the issue worse, some partitions can be handled in 
ReindexPartitions() and ReindexMultipleInternal() should know how many 
to correctly update PROGRESS_CREATEIDX_PARTITIONS_DONE counter, so we 
pass the number of handled partitions to it.


I also have question if in src/backend/commands/indexcmds.c:1239
1240 oldcontext = MemoryContextSwitchTo(ind_context);
1239 childidxs = RelationGetIndexList(childrel);
1241 attmap =
1242 
build_attrmap_by_name(RelationGetDescr(childrel),

1243   parentDesc);
1244 MemoryContextSwitchTo(oldcontext);

should live in ind_context, given that we iterate over this list of oids 
and immediately free it, but at least it shouldn't do much harm.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom adf4242708c2d86092f7c942c7bbb6ef12e6891b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 7 Feb 2022 10:28:42 +0300
Subject: [PATCH 1/4] Allow CREATE INDEX CONCURRENTLY on partitioned table

0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from
https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com

Added fix - rel was used after table_close()
---
 doc/src/sgml/ref/create_index.sgml |   9 --
 src/backend/commands/indexcmds.c   | 150 ++---
 src/test/regress/expected/indexing.out |  60 +-
 src/test/regress/sql/indexing.sql  |  18 ++-
 4 files changed, 181 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 91eaaabc90f..20c8af3e996 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -688,15 +688,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 560dcc87a2c..39b11aebc03 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -670,17 +671,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.  Seems better to be
-		 * consistent, even though we could do it on temporary table because
-		 * we're not actually doing it concurrently.
-		 */
-		if (stmt->concurrent)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
-			RelationGetRelationName(rel;
 		if (stmt->excludeOpNames)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId,
 		if (pd->nparts != 0)
 			flags |= INDEX_CREATE_INVALID;
 	}
+	else if (concurrent && OidIsValid(parentIndexId))
+	{
+		/

Re: Foreign join search stops on the first try

2022-01-25 Thread Alexander Pyhalov


Ashutosh Bapat писал 2022-01-25 17:08:

This code was written long ago. So I may have some recollection
errors. But AFAIR, the reasons we wanted to avoid repeated
estimation/planning for the same foreign join rel were
1. If use_remote_estimate = true, we fetch EXPLAIN output from the
foreign server for various pathkeys. Fetching EXPLAIN output is
expensive. Irrespective of the join order being considered locally, we
expect the foreign server to give us the same cost since the join is
the same. So we avoid running EXPLAIN again and again.
2. If use_remote_estimate = false, the logic to estimate a foreign
join locally is independent of the join order so should yield same
cost again and again. For some reason that doesn't seem to be the case
here.


Hi.
use_remote_estimate was set to false in our case, and yes, it fixed this 
issue.
The problem is that if use_remote_estimate = false, the logic to 
estimate a foreign join locally

is not independent from the join order.

In above example, without patch we see plan with cost: 
cost=382.31..966.86 rows=2 width=37


If we avoid exiting on (joinrel->fdw_private), we can see in gdb the 
following cases, when joining all 3 relations:


case 1:
outerrel:relids (stock, order_line), startup_cost = 100, total_cost = 
2415.92001, rel_startup_cost = 0, rel_total_cost = 2315.5, 
retrieved_rows = 21
innerrel: relid (district) startup_cost = 100, total_cost = 
101.145001, rel_startup_cost = 0, rel_total_cost = 1.125, 
retrieved_rows = 1

joinrel: startup_cost = 100, total_cost = 2416.875, retrieved_rows = 2

case 2:
outerrel: relids (district, order_line), startup_cost = 100, total_cost 
= 281.419996, rel_total_cost = 180, retrieved_rows = 71
innerrel: relid (stock), startup_cost = 100, total_cost = 
683.285008, rel_startup_cost = 0, rel_total_cost = 576.625, 
retrieved_rows = 333

joinrel:  startup_cost = 100, total_cost = 974.88, retrieved_rows = 2


So, (stock join order_line) join district has different cost from 
(district join order_line) join stock.




On Tue, Jan 25, 2022 at 1:26 PM Alexander Pyhalov
 wrote:


It is surprising that the planning time halves with the patch. I
expected it to increase slightly since we will compute estimates
thrice instead of once.


I wouldn't look at estimate times here precisely (and would looked at 
costs). Real example where we found it had 100 times more data, but 
effect was the same. Here some differences in planing time could be 
related to restarting instances with or without patches.




What is use_remote_estimate? Is it ON/OFF?



Yes, it was off.


If we want to proceed along this line, we should take care not to fire
more EXPLAIN queries on the foreign server.


You are correct. Fixed patch to avoid extensive join search when 
use_remote_estimate is true.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 732e89ef198c0bad713b1a18446902b0132aa72c Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 24 Jan 2022 18:28:12 +0300
Subject: [PATCH] Look through all possible foreign join orders

---
 contrib/postgres_fdw/postgres_fdw.c | 50 +
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index bf3f3d9e26e..703e5df1753 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5950,7 +5950,8 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 			JoinType jointype,
 			JoinPathExtraData *extra)
 {
-	PgFdwRelationInfo *fpinfo;
+	PgFdwRelationInfo *fpinfo,
+			   *oldfpinfo;
 	ForeignPath *joinpath;
 	double		rows;
 	int			width;
@@ -5960,9 +5961,11 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
  * EvalPlanQual gets triggered. */
 
 	/*
-	 * Skip if this join combination has been considered already.
+	 * Skip if this join combination has been considered already and rejected
+	 * or if this join uses remote estimates.
 	 */
-	if (joinrel->fdw_private)
+	oldfpinfo = (PgFdwRelationInfo *) joinrel->fdw_private;
+	if (oldfpinfo && (!oldfpinfo->pushdown_safe || oldfpinfo->use_remote_estimate))
 		return;
 
 	/*
@@ -6002,6 +6005,12 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 		epq_path = GetExistingLocalJoinPath(joinrel);
 		if (!epq_path)
 		{
+			if (oldfpinfo)
+			{
+joinrel->fdw_private = oldfpinfo;
+pfree(fpinfo);
+			}
+
 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
 			return;
 		}
@@ -6011,6 +6020,12 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 
 	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
 	{
+		if (oldfpinfo)
+		{
+			joinrel->fdw_private = oldfpinfo;
+			pfree(fpinfo);
+		}
+
 		/* Free path required for EPQ if we copied one; we don't need it now */
 		if (epq_path)
 			pfree(epq_path);
@@ -6044,14 +6059,37 @@ postgresGetForeignJoinPaths(PlannerI

Foreign join search stops on the first try

2022-01-24 Thread Alexander Pyhalov

QUERY PLAN

--
 Foreign Scan  (cost=100.00..974.88 rows=2 width=37) (actual 
time=3.399..3.400 rows=0 loops=1)
   Output: order_line.ol_o_id, order_line.ol_i_id, order_line.ol_d_id, 
stock.s_i_id, stock.s_quantity, stock.s_data, district.d_id, 
district.d_next_o_id, district.d_name
   Relations: ((public.order_line) INNER JOIN (public.district)) INNER 
JOIN (public.stock)
   Remote SQL: SELECT r1.ol_o_id, r1.ol_i_id, r1.ol_d_id, r2.s_i_id, 
r2.s_quantity, r2.s_data, r3.d_id, r3.d_next_o_id, r3.d_name FROM 
((test.order_line r1 INNER JOIN test.district r3 ON (((r1.ol_o_id < 
r3.d_next_o_id)) AND ((r1.ol_o_id >= (r3.d_next_o_id - 20))) AND 
((r3.d_id = 1)) AND ((r1.ol_d_id = 1 INNER JOIN test.stock r2 ON 
(((r1.ol_i_id = r2.s_i_id)) AND ((r2.s_quantity < 11

 Planning Time: 0.928 ms
 Execution Time: 4.511 ms



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom e0c21e620804adbdeb6bc7d11ad606d263e63249 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 24 Jan 2022 18:28:12 +0300
Subject: [PATCH] Look through all possible foreign join orders

---
 .../postgres_fdw/expected/postgres_fdw.out| 10 ++--
 contrib/postgres_fdw/postgres_fdw.c   | 47 ---
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7d6f7d9e3df..1547049108c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1203,8 +1203,8 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t
 -
  Foreign Scan
Output: t1.c1, t2.c2, t3.c3, t1.c3
-   Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-   Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1 ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 10::bigint
+   Relations: ((public.ft1 t1) INNER JOIN (public.ft4 t3)) INNER JOIN (public.ft2 t2)
+   Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 10::bigint
 (4 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
@@ -5585,12 +5585,12 @@ UPDATE ft2 SET c3 = 'foo'
   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
   RETURNING ft2, ft2.*, ft4, ft4.*;   -- can be pushed down
-  QUERY PLAN   

+  QUERY PLAN  
+

Re: Push down time-related SQLValue functions to foreign server

2022-01-20 Thread Alexander Pyhalov


Tom Lane писал 2022-01-18 23:01:

I wrote:

Alexander Pyhalov  writes:

This means we'll translate something like
explain select * from t where d > now() - '1 day'::interval;
to
select * from t where d > $1;



Right.


After thinking about that a bit more, I see that this will result
in a major redefinition of what is "shippable".  Right now, we do not
consider the above WHERE clause to be shippable, not only because of
now() but because the timestamptz-minus-interval operator is dependent
on the timezone setting, which might be different at the remote.
But if we evaluate that operator locally and send its result as a
parameter, the objection vanishes.  In fact, I don't think we even
need to require the subexpression to contain only built-in functions.
Its result still has to be of a built-in type, but that's a much
weaker restriction.



Hi.
So far I have the following prototype. It seems to be working, but I 
think it can be enhanced.
At least, some sort of caching seems to be necessary for 
is_stable_expr().


1) Now expression can be either 'stable shippable' or 'shippable 
according to old rules'. We check if it's 'stable shippable' in 
foreign_expr_walker(),
is_foreign_param() and deparseExpr(). All such exprs are replaced by 
params while deparsing.
2) contain_mutable_functions() now is calculated only for current node, 
if node is not considered 'stable shippable'.


Is it step in the right direction or do I miss something?
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 60e6e0bf98326cb557c70a365797026e9925b7a3 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 29 Jul 2021 11:45:28 +0300
Subject: [PATCH] Push down stable expressions

Stable expressions can be computed locally and sent to remote side as parameters values.
---
 contrib/postgres_fdw/deparse.c| 1162 +
 .../postgres_fdw/expected/postgres_fdw.out|   87 ++
 contrib/postgres_fdw/postgres_fdw.c   |9 +-
 contrib/postgres_fdw/postgres_fdw.h   |1 +
 contrib/postgres_fdw/shippable.c  |   74 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |   22 +
 6 files changed, 811 insertions(+), 544 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..33a79026574 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -158,6 +158,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context);
 static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 	 deparse_expr_cxt *context);
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
+static void deparseStableExpr(Expr *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
 static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
@@ -267,20 +268,17 @@ is_foreign_expr(PlannerInfo *root,
 	if (loc_cxt.state == FDW_COLLATE_UNSAFE)
 		return false;
 
-	/*
-	 * An expression which includes any mutable functions can't be sent over
-	 * because its result is not stable.  For example, sending now() remote
-	 * side could cause confusion from clock offsets.  Future versions might
-	 * be able to make this choice with more granularity.  (We check this last
-	 * because it requires a lot of expensive catalog lookups.)
-	 */
-	if (contain_mutable_functions((Node *) expr))
-		return false;
 
 	/* OK to evaluate on the remote server */
 	return true;
 }
 
+static bool
+contain_mutable_functions_checker(Oid func_id, void *context)
+{
+	return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE);
+}
+
 /*
  * Check if expression is safe to execute remotely, and return true if so.
  *
@@ -321,616 +319,650 @@ foreign_expr_walker(Node *node,
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
 
-	switch (nodeTag(node))
+	if (is_stable_expr(node))
+	{
+		collation = exprCollation(node);
+		if (collation == InvalidOid || collation == DEFAULT_COLLATION_OID)
+			state = FDW_COLLATE_NONE;
+		else
+			state = FDW_COLLATE_UNSAFE;
+	}
+	else
 	{
-		case T_Var:
-			{
-Var		   *var = (Var *) node;
 
-/*
- * If the Var is from the foreign table, we consider its
- * collation (if any) safe to use.  If it is from another
- * table, we treat its collation the same way as we would a
- * Param's collation, ie it's not safe for it to have a
- * non-default collation.
- */
-if (bms_is_member(var->varno, glob_cxt->relids) &&
-	var->varlevelsup == 0)
+		/*
+		 * An expression which includes any mutable functions can't be sent
+		 * over because its result is not stable.
+		 */
+		if (check_functions_in_node(node, contain_mutable_functions_checker, NULL))
+			return false;
+
+		switch (nodeTag(node))
+		{
+			case T_Var:
 {
-	/* Var belongs to foreign table */

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Alexander Pyhalov


Tom Lane писал 2022-01-18 19:53:

Alexander Pyhalov  writes:

[ updated patch ]


Thanks for updating the patch.  (BTW, please attach version numbers
to new patch versions, to avoid confusion.)

However, before we proceed any further with this patch, I think we
really ought to stop and think about the question I raised last
night: why are we building a one-off feature for SQLValueFunction?
Wouldn't the same parameter-substitution mechanism work for *any*
stable expression that doesn't contain remote Vars?  That would
subsume the now() case as well as plenty of others.



Hi.

I think,  I can extend it to allow any stable function (not just 
immutable/sqlvalues) in is_foreign_expr(), but not so sure about 
"expression".


Perhaps, at top of deparseExpr() we can check that expression doesn't 
contain vars, params, but contains stable function, and deparse it as 
param.

This means we'll translate something like

explain select * from t where d > now() - '1 day'::interval;

to

select * from t where d > $1;

The question is how will we reliably determine its typmod (given that I 
read in exprTypmod() comment
"returns the type-specific modifier of the expression's result type, if 
it can be determined.  In many cases, it can't".


What do we save if we don't ship whole expression as param, but only 
stable functions? Allowing them seems to be more straightforward.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Push down time-related SQLValue functions to foreign server

2022-01-18 Thread Alexander Pyhalov


Hi.

Tom Lane писал 2022-01-18 02:08:

Alexander Pyhalov  writes:

Perhaps in a MACRO?



Changed this check to a macro, also fixed condition in
is_foreign_param() and added test for it.
Also fixed comment in prepare_query_params().


I took a quick look at this.  I'm unconvinced that you need the
TIME_RELATED_SQLVALUE_FUNCTION macro, mainly because I think testing
that in is_foreign_param() is pointless.  The only way we'll be seeing
a SQLValueFunction in is_foreign_param() is if we decided it was
shippable, so you really don't need two duplicate tests.
(In the same vein, I would not bother with including a switch in
deparseSQLValueFunction that knows about these opcodes explicitly.
Just use the typmod field; exprTypmod() does.)


Yes, sure, is_foreign_param() is called only when is_foreign_expr() is 
true.

Simplified this part.



I also find it pretty bizarre that contain_unsafe_functions
isn't placed beside its one caller.  Maybe that indicates that
is_foreign_expr is mis-located and should be in shippable.c.

More generally, it's annoying that you had to copy-and-paste
all of contain_mutable_functions to make this.  That creates
a rather nasty maintenance hazard for future hackers, who will
need to keep these widely-separated functions in sync.  Not
sure what to do about that though.  Do we want to extend
contain_mutable_functions itself to cover this use-case?


I've moved logic to contain_mutable_functions_skip_sqlvalues(), it
uses the same subroutines as contain_mutable_functions().
Should we instead just add one more parameter to 
contain_mutable_functions()?
I'm not sure that it's a good idea given that 
contain_mutable_functions() seems to be an

external interface.



The test cases seem a bit overkill --- what is the point of the
two nigh-identical PREPARE tests, or the GROUP BY test?  If
anything is broken about GROUP BY, surely it's not specific
to this patch.


I've removed PREPARE tests, but GROUP BY test checks 
foreign_grouping_ok()/is_foreign_param() path,

so I think it's useful.



I'm not really convinced by the premise of 0002, particularly
this bit:

 static bool
-contain_mutable_functions_checker(Oid func_id, void *context)
+contain_unsafe_functions_checker(Oid func_id, void *context)
 {
-   return (func_volatile(func_id) != PROVOLATILE_IMMUTABLE);
+	/* now() is stable, but we can ship it as it's replaced by parameter 
*/
+	return !(func_volatile(func_id) == PROVOLATILE_IMMUTABLE || func_id 
== F_NOW);

 }

The point of the check_functions_in_node callback API is to verify
the mutability of functions that are embedded in various sorts of
expression nodes ... but they might not be in a plain FuncExpr node,
which is the only case you'll deparse correctly.  It might be that
now() cannot legally appear in any of the other node types that
check_functions_in_node knows about, but I'm not quite convinced
of that, and even less convinced that that'll stay true as we add
more expression node types.  Also, if we commit this, for sure
some poor soul will try to expand the logic to some other stable
function that *can* appear in those contexts, and that'll be broken.

The implementation of converting now() to CURRENT_TIMESTAMP
seems like an underdocumented kluge, too.

On the whole I'm a bit inclined to drop 0002; I'm not sure it's
worth the trouble.



OK. Let's drop it for now.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 9a67c52b57e0b50a3702598aa0b3e8af89569a9c Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 29 Jul 2021 11:45:28 +0300
Subject: [PATCH] SQLValue functions pushdown

current_timestamp, localtimestamp and similar SQLValue functions
can be computed locally and sent to remote side as parameters values.
---
 contrib/postgres_fdw/deparse.c| 83 -
 .../postgres_fdw/expected/postgres_fdw.out| 88 +++
 contrib/postgres_fdw/postgres_fdw.c   |  9 +-
 contrib/postgres_fdw/postgres_fdw.h   |  1 +
 contrib/postgres_fdw/shippable.c  |  3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 22 +
 src/backend/optimizer/util/clauses.c  | 27 +-
 src/include/optimizer/optimizer.h |  1 +
 8 files changed, 226 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..a398e1b2174 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -109,6 +109,15 @@ typedef struct deparse_expr_cxt
 		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
 #define SUBQUERY_REL_ALIAS_PREFIX	"s"
 #define SUBQUERY_COL_ALIAS_PREFIX	"c"
+#define TIME_RELATED_SQLVALUE_FUNCTION(s)	\
+		(s->op == SVFOP_CURRENT_TIMESTAMP || \
+		 s->op == SVFOP_CURRENT_TIMESTAMP_N || \
+		 s->op == SVFOP_CURRENT_TIME || \
+		 s->op == SVFOP_CURRENT_TIME_N || \
+		 s->op == SVFOP_LOCALTIMESTAMP || \
+		 s->op == SVFOP_LOCALTIMESTAMP_N

Re: Partial aggregates pushdown

2022-01-17 Thread Alexander Pyhalov


Alexander Pyhalov писал 2022-01-17 15:26:

Zhihong Yu писал 2022-01-17 11:43:

Hi,
+   FdwScanPrivateConvertors

+ * Generate attinmeta if there are some converters:

I think it would be better if converter is spelled the same way across
the patch.

For build_conv_list():

+   if (IS_UPPER_REL(foreignrel))

You can return NIL for !IS_UPPER_REL(foreignrel) - this would save
indentation for the body of the func.


Hi.
Updated patch.


Sorry, missed attachment.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 4408f98a67872efd3c09a3bf89e7cbf88db2a8b2 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  57 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 196 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  20 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 12 files changed, 672 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..6b12b7bf76b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -832,8 +833,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (!aggform->aggpartialpushdownsafe)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/*
+	 * If an aggregate requires serialization/deserialization, partial
+	 * converter should be defined
+	 */
+	if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/* In this case we currently don't use converter */
+	if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	ReleaseSysCache(aggtup);
+	return true;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7d6f7d9e3df..549bb9bae61 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9336,13 +9336,13 @@ RESET enable_partitionwise_join;
 -- ===
 -- test partitionwise aggregates
 -- ===
-CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+C

Re: Partial aggregates pushdown

2022-01-17 Thread Alexander Pyhalov


Zhihong Yu писал 2022-01-17 11:43:

Hi,
+   FdwScanPrivateConvertors

+ * Generate attinmeta if there are some converters:

I think it would be better if converter is spelled the same way across
the patch.

For build_conv_list():

+   if (IS_UPPER_REL(foreignrel))

You can return NIL for !IS_UPPER_REL(foreignrel) - this would save
indentation for the body of the func.


Hi.
Updated patch.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Asymmetric partition-wise JOIN

2022-01-17 Thread Alexander Pyhalov


Andrey Lepikhov писал 2021-09-15 09:31:

On 14/9/21 11:37, Andrey V. Lepikhov wrote:

Thank you for this good catch!
The problem was in the adjust_child_relids_multilevel routine. The 
tmp_result variable sometimes points to original required_outer.
This patch adds new ways which optimizer can generate plans. One 
possible way is optimizer reparameterizes an inner by a plain relation 
from the outer (maybe as a result of join of the plain relation and 
partitioned relation). In this case we have to compare tmp_result with 
original pointer to realize, it was changed or not.
The patch in attachment fixes this problem. Additional regression test 
added.



I thought more and realized there isn't necessary to recurse in the
adjust_child_relids_multilevel() routine if required_outer contains
only
normal_relids.
Also, regression tests were improved a bit.


Hi.
The patch does not longer apply cleanly, so I rebased it. Attaching 
rebased version.

I've looked through it once again and have several questions.

1) In adjust_appendrel_attrs_multilevel(), can it happen that 
child_relids is zero-length list (in this case pfree's will fail)? It 
seems, no, but should we at least assert this? Note that in 
adjust_appendrel_attrs() we add logic for nappinfos being 0.


2) In try_asymmetric_partitionwise_join() we state that 'Asymmetric join 
isn't needed if the append node has only one child'. This is not 
completely correct. Asymmetric join with one partition can be 
advantageous when JOIN(A, UNION(B)) is more expensive than UNION(JOIN 
(A, B)). The later is true, for example, when we join partitioned table 
having foreign partitions with another foreign table and only one 
partition is left.
Let's take the attached case (foreign_join.sql). When 
list_length(append_path->subpaths) > 1 is present, we get the following 
plan


set enable_partitionwise_join = on;

explain SELECT t1.a,t2.b FROM fprt1 t1 INNER JOIN ftprt2_p1 t2 ON (t1.a 
= t2.b) WHERE t1.a < 250 AND t2.c like '%0004' ORDER BY 1,2;

  QUERY PLAN
---
 Sort  (cost=208.65..208.69 rows=17 width=8)
   Sort Key: t1.a
   ->  Hash Join  (cost=202.60..208.30 rows=17 width=8)
 Hash Cond: (t1.a = t2.b)
 ->  Foreign Scan on ftprt1_p1 t1  (cost=100.00..105.06 rows=125 
width=4)

 ->  Hash  (cost=102.39..102.39 rows=17 width=4)
   ->  Foreign Scan on ftprt2_p1 t2  (cost=100.00..102.39 
rows=17 width=4)


In case when we change it to list_length(append_path->subpaths) > 0, we 
get foreign join and cheaper plan:


explain verbose SELECT t1.a,t2.b FROM fprt1 t1 INNER JOIN ftprt2_p1 t2 
ON (t1.a = t2.b) WHERE t1.a < 250 AND t2.c like '%0004' ORDER BY 1,2;
 
 QUERY PLAN

---
 Sort  (cost=106.15..106.19 rows=17 width=8)
   Output: t1.a, t2.b
   Sort Key: t1.a
   ->  Foreign Scan  (cost=102.26..105.80 rows=17 width=8)
 Output: t1.a, t2.b
 Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 
t2)
 Remote SQL: SELECT r4.a, r2.b FROM (public.fprt1_p1 r4 INNER 
JOIN public.fprt2_p1 r2 ON (((r4.a = r2.b)) AND ((r2.c ~~ '%0004')) AND 
((r4.a < 250



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 029f0662f5401e79468a315a658b05b2f2a4e7a6 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 17 Jan 2022 11:33:03 +0300
Subject: [PATCH] Asymmetric partitionwise join.

Teach optimizer to consider partitionwise join of non-partitioned
table with each partition of partitioned table.

Disallow asymmetric machinery for joining of two partitioned (or appended)
relations because it could cause huge consumption of CPU and memory
during reparameterization of NestLoop path.

Change logic of the multilevel child relids adjustment, because this
feature allows the optimizer to plan in new way.
---
 src/backend/optimizer/path/joinpath.c|   9 +
 src/backend/optimizer/path/joinrels.c| 187 
 src/backend/optimizer/plan/setrefs.c |  17 +-
 src/backend/optimizer/util/appendinfo.c  |  45 +-
 src/backend/optimizer/util/pathnode.c|   9 +-
 src/backend/optimizer/util/relnode.c |  19 +-
 src/include/optimizer/paths.h|   7 +-
 src/test/regress/expected/partition_join.out | 425 +++
 src/test/regress/sql/partition_join.sql  | 180 
 9 files changed, 867 insertions(+), 31 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index f96fc9fd282..6531981d0d8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo

Re: Partial aggregates pushdown

2022-01-16 Thread Alexander Pyhalov


Julien Rouhaud писал 2022-01-14 15:16:

Hi,

On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote:


I've updated patch - removed catversion dump.


This version of the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3369.log
=== Applying patches on top of PostgreSQL commit ID
025b920a3d45fed441a0a58fdcdf05b321b1eead ===
=== applying patch ./0001-Partial-aggregates-push-down-v07.patch
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 succeeded at 13111 (offset -965 lines).
Hunk #2 FAILED at 14167.
Hunk #3 succeeded at 13228 (offset -961 lines).
Hunk #4 succeeded at 13319 (offset -966 lines).
1 out of 4 hunks FAILED -- saving rejects to file 
src/bin/pg_dump/pg_dump.c.rej


Could you send a rebased version?  In the meantime I will switch the cf 
entry

to Waiting on Author.


Hi. Attaching rebased patch.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom d9227853b4ebeaa84ce4fd9623b2128cd6b2e494 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  57 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 196 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  20 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 12 files changed, 672 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..6b12b7bf76b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -832,8 +833,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (!aggform->aggpartialpushdownsafe)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/*
+	 * If an aggregate requires serialization/deserialization, partial
+	 * converter should be defined
+	 */
+	if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/* In this case we currently don't use converter */
+	if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	ReleaseSysCache(aggtup);
+	return true;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7d6f7d9e3df..549bb9bae61 100644
--- a/contrib/pos

Re: postgres_fdw: incomplete subabort cleanup of connections used in async execution

2021-12-21 Thread Alexander Pyhalov


Etsuro Fujita писал 2021-12-19 13:25:

Hi,

While working on [1], I noticed $SUBJECT: postgres_fdw resets the
per-connection states of connections, which store async requests sent
to remote servers in async_capable mode, during post-abort
(pgfdw_xact_callback()), but it fails to do so during post-subabort
(pgfdw_subxact_callback()).  This causes a crash when re-executing a
query that was aborted in a subtransaction:



Hi.
Looks good to me.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2021-11-15 Thread Alexander Pyhalov


Daniel Gustafsson писал 2021-11-15 13:16:
On 3 Nov 2021, at 15:50, Alexander Pyhalov  
wrote:


Daniel Gustafsson писал 2021-11-03 16:45:
On 2 Nov 2021, at 10:12, Alexander Pyhalov 
 wrote:

Updated and rebased patch.

+   state = (Int128AggState *) palloc0(sizeof(Int128AggState));
+   state->calcSumX2 = false;
+
+   if (!PG_ARGISNULL(0))
+   {
+#ifdef HAVE_INT128
+   do_int128_accum(state, (int128) PG_GETARG_INT64(0));
+#else
+   do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0)));
+#endif
This fails on non-INT128 platforms as state cannot be cast to 
Int128AggState

outside of HAVE_INT128; it's not defined there.  This needs to be a
PolyNumAggState no?


Hi.
Thank you for noticing this. It's indeed fails with 
pgac_cv__128bit_int=no.

Updated patch.


The updated patch also fails to apply now, but on the catversion.h 
bump.  To
avoid having to rebase for that I recommend to skip that part in the 
patch and
just mention the need in the thread, any committer picking this up for 
commit
will know to bump the catversion so there is no use in risking 
unneccesary

conflicts.


I've updated patch - removed catversion dump.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 2af16e66276938b861cf7a8db2fef967f54b800f Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  57 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 196 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  21 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 12 files changed, 673 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index b27689d0864..a515c5662bb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -832,8 +833,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (!aggform->aggpartialpushdownsafe)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/*
+	 * If an aggregate requires serialization/deserialization, partial
+	 * converter should be defined
+	 */
+	if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/* In this case we currently don't use converter */
+	if (agg->aggtranstype != INTERNALO

Re: Partial aggregates pushdown

2021-11-03 Thread Alexander Pyhalov


Daniel Gustafsson писал 2021-11-03 16:45:
On 2 Nov 2021, at 10:12, Alexander Pyhalov  
wrote:



Updated and rebased patch.


+   state = (Int128AggState *) palloc0(sizeof(Int128AggState));
+   state->calcSumX2 = false;
+
+   if (!PG_ARGISNULL(0))
+   {
+#ifdef HAVE_INT128
+   do_int128_accum(state, (int128) PG_GETARG_INT64(0));
+#else
+   do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0)));
+#endif

This fails on non-INT128 platforms as state cannot be cast to 
Int128AggState

outside of HAVE_INT128; it's not defined there.  This needs to be a
PolyNumAggState no?


Hi.
Thank you for noticing this. It's indeed fails with 
pgac_cv__128bit_int=no.

Updated patch.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f72a3d52a2b85ad9ea5f61f8ff5c46cb50ae3ec8 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  57 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 196 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  21 +-
 src/include/catalog/catversion.h  |   2 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 674 insertions(+), 84 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..8cee12c1b2a 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (!aggform->aggpartialpushdownsafe)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/*
+	 * If an aggregate requires serialization/deserialization, partial
+	 * converter should be defined
+	 */
+	if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	/* In this case we currently don't use converter */
+	if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	ReleaseSysCache(aggtup);
+	return true;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index fd141a0fa5c..80c507783e6 100644
--- a/contrib/postgres_fdw/expected/post

Re: Partial aggregates pushdown

2021-11-02 Thread Alexander Pyhalov


Hi.

Updated and rebased patch.

Ilya Gladyshev писал 2021-11-02 00:31:

Hi,
On 21.10.2021 13:55, Alexander Pyhalov wrote:


Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters.


I don't quite understand why this is restricted only to aggregates
that have 'internal' state, I feel like that should be possible for
any aggregate that has a function to convert its final result back to
aggregate state to be pushed down. While I couldn't come up with a
useful example for this, except maybe for an aggregate whose
aggfinalfn is used purely for cosmetic purposes (e.g. format the
result into a string), I still feel that it is an unnecessary
restriction.



I don't feel comfortable with it for the following reasons.
- Now partial converters translate aggregate result to serialized 
internal representation.

In case when aggregate type is different from internal state,
we'd have to translate it to non-serialized internal representation,
so converters should skip serialization step. This seems like 
introducing two

kind of converters.
- I don't see any system aggregates which would benefit from this.

However, it doesn't seem to be complex, and if it seems to be desirable,
it can be done.
For now introduced check that transtype matches aggregate type (or is 
internal)

in partial_agg_ok().



A few minor review notes to the patch:

+static List *build_conv_list(RelOptInfo *foreignrel);

this should probably be up top among other declarations.



Moved it upper.



@@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root,
 outer_plan);
 }

+/*
+ * Generate attinmeta if there are some converters:
+ * they are expecxted to return BYTEA, but real input type is likely
different.
+ */

typo in word "expecxted".


Fixed.



@@ -139,10 +147,13 @@ typedef struct PgFdwScanState
  * for a foreign join scan. */
 TupleDesctupdesc;/* tuple descriptor of scan */
 AttInMetadata *attinmeta;/* attribute datatype conversion
metadata */
+AttInMetadata *rcvd_attinmeta;/* metadata for received
tuples, NULL if
+ * there's no converters */

Looks like rcvd_attinmeta is redundant and you could use attinmeta for
conversion metadata.


Seems so, removed it.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 71de85154f9ac78e99f4ce8bf9aa341fee748609 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  57 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 196 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  21 +-
 src/include/catalog/catversion.h  |   2 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 674 insertions(+), 84 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..8cee12c1b2a 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(

Re: Partial aggregates pushdown

2021-11-01 Thread Alexander Pyhalov


Peter Eisentraut писал 2021-11-01 12:47:

On 21.10.21 12:55, Alexander Pyhalov wrote:
Now aggregates with internal states can be pushed down, if they are 
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are called 
locally, they transform aggregate result to serialized internal 
representation.
As converters don't have access to internal aggregate state, partial 
aggregates like avg() are still not pushable.


It seems to me that the system should be able to determine from the
existing aggregate catalog entry whether an aggregate can be pushed
down.  For example, it could check aggtranstype != internal and
similar.  A separate boolean flag should not be necessary.


Hi.
I think we can't infer this property from existing flags. For example, 
if I have avg() with bigint[] argtranstype, it doesn't mean we can push 
down it. We couldn't also decide if partial aggregete is safe to push 
down based on aggfinalfn presence (for example, it is defined for 
sum(numeric), but we can push it down.



Or if it
is, the patch should provide some guidance about how an aggregate
function author should set it.


Where should it be provided?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

2021-10-22 Thread Alexander Pyhalov


Zhihong Yu писал 2021-10-22 00:43:

Hi,
w.r.t. 0001-Partial-aggregates-push-down-v03.patch



Hi.


For partial_agg_ok(),

+   if (agg->aggdistinct || agg->aggvariadic || agg->aggkind !=
AGGKIND_NORMAL || agg->aggorder != NIL)
+   ok = false;

Since SearchSysCache1() is not called yet, you can return false
directly.


Fixed.



+   if (aggform->aggpartialpushdownsafe != true)

The above can be written as:

   if (!aggform->aggpartialpushdownsafe)


Fixed.



For build_conv_list():

+   Oid converter_oid = InvalidOid;
+
+   if (IsA(tlentry->expr, Aggref))

...
+   }
+   convlist = lappend_oid(convlist, converter_oid);

Do you intend to append InvalidOid to convlist (when tlentry->expr is
not Aggref) ?


Yes, for each tlist member (which matches fpinfo->grouped_tlist in case 
when foreignrel is UPPER_REL) we need to find corresponding converter.
If we don't append InvalidOid, we can't find convlist member, 
corresponding to tlist member. Added comments to build_conv_list.


Also fixed error in pg_dump.c (we selected '0' when 
aggpartialconverterfn was not defined in schema, but checked for '-').


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom a18e2ff8de00592797e7c3ccb8d6cd536a2e4e72 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  46 +++-
 .../postgres_fdw/expected/postgres_fdw.out| 185 +++-
 contrib/postgres_fdw/postgres_fdw.c   | 204 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 +-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  21 +-
 src/include/catalog/catversion.h  |   2 +-
 src/include/catalog/pg_aggregate.dat  | 106 -
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 672 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..fa9f487d66d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,40 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+	bool		ok = true;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (!aggform->aggpartialpushdownsafe)
+		ok = false;
+
+	/*
+	 * But if an aggregate requires serialization/deserialization, partial
+	 * converter should be defined
+	 */
+	if (ok && agg->aggtranstype == INTERNALOID)
+		ok = (aggform->aggpartialconverterfn != InvalidOid);
+
+	ReleaseSysCache(aggtup);
+
+	return ok;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fd

Re: Partial aggregates pushdown

2021-10-21 Thread Alexander Pyhalov


Tomas Vondra писал 2021-10-19 16:25:

On 10/19/21 08:56, Alexander Pyhalov wrote:

Hi.

Tomas Vondra писал 2021-10-15 17:56:

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch 
does

it, i.e. based on aggregate name. That's both fragile (people can
create new aggregates with such name) and against the PostgreSQL
extensibility (people may implement custom aggregates, but won't be
able to benefit from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this 
in

some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.



Updated patch to mark aggregates as pushdown-safe in pg_aggregates.

So far have no solution for aggregates with internal aggtranstype.


Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are 
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are called 
locally, they transform aggregate result to serialized internal 
representation.
As converters don't have access to internal aggregate state, partial 
aggregates like avg() are still not pushable.


For now the overall logic is quite simple. We now also call 
add_foreign_grouping_paths() for partial aggregation.  In 
foreign_expr_walker() we check if aggregate is pushable (which means 
that it is simple, marked as pushable and if has 'internal' as 
aggtranstype, has associated converter).
If it is pushable, we proceed as with usual aggregates (but forbid 
having pushdown). During postgresGetForeignPlan() we produce list of 
converters for aggregates. As converters has different input argument 
type from their result (bytea), we have to generate alternative 
metadata, which is used by make_tuple_from_result_row().
If make_tuple_from_result_row() encounters field with converter, it 
calls converter and returns result. For now we expect converter to have 
only one input and output argument. Existing converters just transform 
input value to internal representation and return its serialized form.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 52cd61fdb5cb5fceeacd832462468d8676f57ca6 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  49 -
 .../postgres_fdw/expected/postgres_fdw.out| 185 -
 contrib/postgres_fdw/postgres_fdw.c   | 195 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  27 ++-
 src/backend/catalog/pg_aggregate.c|  28 ++-
 src/backend/commands/aggregatecmds.c  |  23 ++-
 src/backend/utils/adt/numeric.c   |  96 +
 src/bin/pg_dump/pg_dump.c |  21 +-
 src/include/catalog/catversion.h  |   2 +-
 src/include/catalog/pg_aggregate.dat  | 106 +-
 src/include/catalog/pg_aggregate.h|  10 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/test/regress/expected/oidjoins.out|   1 +
 13 files changed, 666 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..50ef1009b97 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,43 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ *

Re: Partial aggregates pushdown

2021-10-19 Thread Alexander Pyhalov


Hi.

Tomas Vondra писал 2021-10-15 17:56:

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch does
it, i.e. based on aggregate name. That's both fragile (people can
create new aggregates with such name) and against the PostgreSQL
extensibility (people may implement custom aggregates, but won't be
able to benefit from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this in
some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.



Updated patch to mark aggregates as pushdown-safe in pg_aggregates.

So far have no solution for aggregates with internal aggtranstype.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 823a389caf003a21dd4c8e758f89d08ba89c5856 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  45 +++-
 .../postgres_fdw/expected/postgres_fdw.out| 215 +-
 contrib/postgres_fdw/postgres_fdw.c   |  29 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  31 ++-
 src/backend/catalog/pg_aggregate.c|   4 +-
 src/backend/commands/aggregatecmds.c  |   6 +-
 src/include/catalog/catversion.h  |   2 +-
 src/include/catalog/pg_aggregate.dat  | 101 
 src/include/catalog/pg_aggregate.h|   6 +-
 9 files changed, 362 insertions(+), 77 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..cf6b2d9f066 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,39 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	aggtup;
+	Form_pg_aggregate aggform;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	/* Can't process aggregates which require serialization/deserialization */
+	if (agg->aggtranstype == INTERNALOID)
+		return false;
+
+	aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(aggtup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	aggform = (Form_pg_aggregate) GETSTRUCT(aggtup);
+
+	/* Only aggregates, marked as pushdown safe, are allowed */
+	if (aggform->aggpartialpushdownsafe != true)
+	{
+		ReleaseSysCache(aggtup);
+		return false;
+	}
+
+	ReleaseSysCache(aggtup);
+
+	return true;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44c4367b8f9..89451e208e0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join;
 -- ===
 -- test partitionwise aggregates
 -- ===
-CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab (a int, b int, c text, d numeric) PARTITION BY RANGE(a);
 CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
 CR

Re: Partial aggregates pushdown

2021-10-15 Thread Alexander Pyhalov


Tomas Vondra писал 2021-10-15 17:56:

Hi Alexander,



Hi.


And then we should extend this for aggregates with more complex
internal states (e.g. avg), by supporting a function that "exports"
the aggregate state - similar to serial/deserial functions, but needs
to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote
node to use a different final function for the top-level node?




If we have some special export function, how should we find out that 
remote server supports this? Should it be server property or should it 
somehow find out it while connecting to the server?


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Partial aggregates pushdown

2021-10-15 Thread Alexander Pyhalov


Hi.

One of the issues when we try to use sharding in PostgreSQL is absence 
of partial aggregates pushdown.


I see several opportunities to alleviate this issue.
If we look at Citus, it implements aggregate, calculating internal state 
of an arbitrary agregate function and exporting it as text. So we could 
calculate internal states independently on all data sources and then 
finalize it, which allows to compute arbitrary aggregate.


But, as mentioned in [1] thread, for some functions (like 
count/max/min/sum) we can just push down them. It seems easy and covers 
a lot of cases.
For now there are still issues - for example you can't handle functions 
as avg() as we should somehow get its internal state or sum() variants, 
which need aggserialfn/aggdeserialfn. Preliminary version is attached.


Is someone else working on the issue? Does suggested approach make 
sense?


[1] 
https://www.postgresql.org/message-id/flat/9998c3af9fdb5f7d62a6c7ad0fcd9142%40postgrespro.ru


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f2cf87a0ba1f4e4bf3f5f9e5b10b782c51717baf Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 14 Oct 2021 17:30:34 +0300
Subject: [PATCH] Partial aggregates push down

---
 contrib/postgres_fdw/deparse.c|  59 -
 .../postgres_fdw/expected/postgres_fdw.out| 215 +-
 contrib/postgres_fdw/postgres_fdw.c   |  29 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  31 ++-
 4 files changed, 310 insertions(+), 24 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..0e4ac0ad255 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel,
 static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 		  int *relno, int *colno);
 
+static bool partial_agg_ok(Aggref *agg);
 
 /*
  * Examine each qual clause in input_conds, and classify them into two groups,
@@ -831,8 +832,10 @@ foreign_expr_walker(Node *node,
 if (!IS_UPPER_REL(glob_cxt->foreignrel))
 	return false;
 
-/* Only non-split aggregates are pushable. */
-if (agg->aggsplit != AGGSPLIT_SIMPLE)
+if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL))
+	return false;
+
+if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg))
 	return false;
 
 /* As usual, it must be shippable. */
@@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	bool		use_variadic;
 
 	/* Only basic, non-split aggregation accepted. */
-	Assert(node->aggsplit == AGGSPLIT_SIMPLE);
+	Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL);
 
 	/* Check if need to print VARIADIC (cf. ruleutils.c) */
 	use_variadic = node->aggvariadic;
@@ -3719,3 +3722,53 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel,
 	/* Shouldn't get here */
 	elog(ERROR, "unexpected expression in subquery output");
 }
+
+/*
+ * Check that partial aggregate agg is fine to push down
+ */
+static bool
+partial_agg_ok(Aggref *agg)
+{
+	HeapTuple	proctup;
+	Form_pg_proc procform;
+	const char *proname;
+	bool		ok = false;
+
+	Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+
+	/* We don't support complex partial aggregates */
+	if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL)
+		return false;
+
+	/* Can't process aggregates which require serialization/deserialization */
+	if (agg->aggtranstype == INTERNALOID)
+		return false;
+
+	proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(agg->aggfnoid));
+	if (!HeapTupleIsValid(proctup))
+		elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid);
+	procform = (Form_pg_proc) GETSTRUCT(proctup);
+
+	/* Only system aggregates are allowed */
+	if (procform->pronamespace != PG_CATALOG_NAMESPACE)
+	{
+		ReleaseSysCache(proctup);
+		return false;
+	}
+
+	/*
+	 * Ensure that this partial aggregate can be evaluated directly. We check
+	 * function name to avoid checking a dozen function variants.
+	 */
+	proname = NameStr(procform->proname);
+
+	if (strcmp(proname, "min") == 0 ||
+		strcmp(proname, "max") == 0 ||
+		strcmp(proname, "count") == 0 ||
+		strcmp(proname, "sum") == 0)
+		ok = true;
+
+	ReleaseSysCache(proctup);
+
+	return ok;
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44c4367b8f9..89451e208e0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join;
 -- ===
 -- test partitionwise aggregates
 -- ==

Re: Function scan FDW pushdown

2021-10-04 Thread Alexander Pyhalov


Ashutosh Bapat писал 2021-06-15 16:15:

Hi Alexander,


Hi.

The current version of the patch is based on asymetric partition-wise 
join.
Currently it is applied after 
v19-0001-Asymmetric-partitionwise-join.patch from
on 
https://www.postgresql.org/message-id/792d60f4-37bc-e6ad-68ca-c2af5cbb2...@postgrespro.ru 
.


So far I don't know how to visualize actual function expression used 
in

function RTE, as in postgresExplainForeignScan() es->rtable comes from
queryDesc->plannedstmt->rtable, and rte->functions is already 0.


The actual function expression will be part of the Remote SQL of
ForeignScan node so no need to visualize it separately.


We still need to create tuple description for functions in 
get_tupdesc_for_join_scan_tuples(),
so I had to remove setting newrte->functions to NIL in 
add_rte_to_flat_rtable().

With rte->functions in place, there's no issues for explain.



The patch will have problems when there are multiple foreign tables
all on different servers or use different FDWs. In such a case the
function scan's RelOptInfo will get the fpinfo based on the first
foreign table the function scan is paired with during join planning.
But that may not be the best foreign table to join. We should be able
to plan all the possible joins. Current infra to add one fpinfo per
RelOptInfo won't help there. We need something better.


I suppose attached version of the patch is more mature.



The patch targets only postgres FDW, how do you see this working with
other FDWs?


Not now. We introduce necessary APIs for other FDWs, but implementing 
TryShippableJoinPaths()

doesn't seem straightforward.



If we come up with the right approach we could use it for 1. pushing
down queries with IN () clause 2. joining a small local table with a
large foreign table by sending the local table rows down to the
foreign server.



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom d997c313daf0031b812d3fca59d338be1a4f2196 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 17 May 2021 19:19:31 +0300
Subject: [PATCH] Push join with function scan to remote server

---
 contrib/postgres_fdw/deparse.c|  199 ++-
 .../postgres_fdw/expected/postgres_fdw.out| 1095 +
 contrib/postgres_fdw/postgres_fdw.c   |  497 +++-
 contrib/postgres_fdw/postgres_fdw.h   |6 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  336 +
 src/backend/optimizer/path/joinpath.c |   11 +
 src/backend/optimizer/plan/setrefs.c  |1 -
 src/backend/optimizer/util/relnode.c  |2 +
 src/include/foreign/fdwapi.h  |1 +
 9 files changed, 2035 insertions(+), 113 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..7f08575ef60 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -151,6 +151,7 @@ static void deparseConst(Const *node, deparse_expr_cxt *context, int showtype);
 static void deparseParam(Param *node, deparse_expr_cxt *context);
 static void deparseSubscriptingRef(SubscriptingRef *node, deparse_expr_cxt *context);
 static void deparseFuncExpr(FuncExpr *node, deparse_expr_cxt *context);
+static void deparseFuncColnames(StringInfo buf, int varno, RangeTblEntry *rte, bool qualify_col);
 static void deparseOpExpr(OpExpr *node, deparse_expr_cxt *context);
 static void deparseOperatorName(StringInfo buf, Form_pg_operator opform);
 static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context);
@@ -1740,13 +1741,54 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 	{
 		RangeTblEntry *rte = planner_rt_fetch(foreignrel->relid, root);
 
-		/*
-		 * Core code already has some lock on each rel being planned, so we
-		 * can use NoLock here.
-		 */
-		Relation	rel = table_open(rte->relid, NoLock);
+		Assert(rte->rtekind == RTE_RELATION || rte->rtekind == RTE_FUNCTION);
+		if (rte->rtekind == RTE_RELATION)
+		{
+			/*
+			 * Core code already has some lock on each rel being planned, so
+			 * we can use NoLock here.
+			 */
+			Relation	rel = table_open(rte->relid, NoLock);
 
-		deparseRelation(buf, rel);
+			deparseRelation(buf, rel);
+
+			table_close(rel, NoLock);
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			RangeTblFunction *rtfunc;
+			deparse_expr_cxt context;
+			ListCell   *lc;
+			bool		first = true;
+			int			n;
+
+			n = list_length(rte->functions);
+			Assert(n >= 1);
+
+			if (n > 1)
+appendStringInfoString(buf, "ROWS FROM (");
+
+			foreach(lc, rte->functions)
+			{
+if (!first)
+	appendStringInfoString(buf, ", ");
+else
+	first = false;
+
+rtfunc = (RangeTblFunction *) lfirst(lc);
+
+context.root = root;
+context.foreignrel = foreignrel;
+context.scanrel = foreignrel;
+context.buf = buf;
+context.params_list = params_list;
+
+deparseExpr((Expr *) rtfu

Re: Defer selection of asynchronous subplans until the executor initialization stage

2021-09-15 Thread Alexander Pyhalov


Etsuro Fujita писал 2021-08-30 12:52:

On Mon, Aug 30, 2021 at 5:36 PM Andrey V. Lepikhov

To allow async execution in a bit more cases, I modified the patch a
bit further: a ProjectionPath put directly above an async-capable
ForeignPath would also be considered async-capable as ForeignScan can
project and no separate Result is needed in that case, so I modified
mark_async_capable_plan() as such, and added test cases to the
postgres_fdw regression test.  Attached is an updated version of the
patch.



Hi.

The patch looks good to me and seems to work as expected.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Push down time-related SQLValue functions to foreign server

2021-08-20 Thread Alexander Pyhalov


Ranier Vilela писал 2021-08-20 14:19:


Another question:
For 0002 patch:

+ if (node->funcid == F_NOW)
+ {
+ SQLValueFunction *svf = makeNode(SQLValueFunction);
+
+ svf->op = SVFOP_CURRENT_TIMESTAMP;
+ svf->type = TIMESTAMPTZOID;
+ svf->typmod = -1;
+ svf->location = -1;
+
+ deparseSQLValueFunction(svf, context);
+
+ return;
+ }
It seems to me that the svf->xpr field ( SQLValueFunction *svf ) is
not initialized somewhere even by deparseSQLValueFunction.
If it's not really used, it should be initialized to NULL, ok?



xpr field just carries node type, which will be initialized by 
makeNode().


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Push down time-related SQLValue functions to foreign server

2021-08-20 Thread Alexander Pyhalov


Hi.

Ashutosh Bapat писал 2021-08-19 17:01:

I spent some time looking at this patch.

Generally it looks like a good idea. These stable functions will be
evaluated at the execution time and replaced with constants. I am not
sure whether the nodes saved in the param_list may not get the same
treatment. Have you verified that?


I'm not sure I understand you. All parameters are treated in the same 
way.
They are evaluated in process_query_params(), real params and 
parameters, corresponding to our SQLValue functions.

If we look at execution of something like

 explain verbose select * from test1 t1 where i in (select i from test1 
t2 where t2.t< now() and t1.i=t2.i) ;

QUERY PLAN
---
 Foreign Scan on public.test1 t1  (cost=100.00..243310.11 rows=930 
width=20)

   Output: t1.i, t1.t, t1.l
   Filter: (SubPlan 1)
   Remote SQL: SELECT i, t, l FROM data.test1
   SubPlan 1
 ->  Foreign Scan on public.test1 t2  (cost=100.00..161.29 rows=5 
width=4)

   Output: t2.i
   Remote SQL: SELECT i FROM data.test1 WHERE (($1::integer = 
i)) AND ((t < $2::timestamp with time zone)


we can see two parameters evaluated in process_query_params(), one - of 
T_Param type (with value of current t1.i) and one of T_SQLValueFunction 
type (with value of current_timestamp).




Also the new node types being added to the param list is something
other than Param. So it conflicts with the comment below in
prepare_query_params()?
   /*
 * Prepare remote-parameter expressions for evaluation.  (Note: in
 * practice, we expect that all these expressions will be just 
Params, so
 * we could possibly do something more efficient than using the 
full
 * expression-eval machinery for this.  But probably there would be 
little
 * benefit, and it'd require postgres_fdw to know more than is 
desirable

 * about Param evaluation.)
 */
If we are already adding non-params to this list, then the comment is 
outdated?


Fixed comment in the new version of the patches.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Push down time-related SQLValue functions to foreign server

2021-08-20 Thread Alexander Pyhalov


Hi.

Ranier Vilela писал 2021-08-19 14:01:

Em qui., 19 de ago. de 2021 às 07:50, Zhihong Yu 

Hi,
For 0001 patch:

+   if ((s->op != SVFOP_CURRENT_TIMESTAMP) &&
+   (s->op != SVFOP_CURRENT_TIMESTAMP_N) &&
+   (s->op != SVFOP_CURRENT_TIME) &&
...

The above check appears more than once. If extracted into a helper
method, it would help reduce duplicate and make the code more
readable.


Perhaps in a MACRO?


Changed this check to a macro, also fixed condition in 
is_foreign_param() and added test for it.

Also fixed comment in prepare_query_params().


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 2cfd3e42cad07ed552a1eb23b06040b0f74a7f2f Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 29 Jul 2021 11:45:28 +0300
Subject: [PATCH 1/2] SQLValue functions pushdown

current_timestamp, localtimestamp and similar SQLValue functions
can be computed locally and sent to remote side as parameters values.
---
 contrib/postgres_fdw/deparse.c|  95 +-
 .../postgres_fdw/expected/postgres_fdw.out| 121 ++
 contrib/postgres_fdw/postgres_fdw.c   |   9 +-
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/shippable.c  |  68 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  33 +
 6 files changed, 320 insertions(+), 7 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..6c99acd0c82 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -109,6 +109,15 @@ typedef struct deparse_expr_cxt
 		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
 #define SUBQUERY_REL_ALIAS_PREFIX	"s"
 #define SUBQUERY_COL_ALIAS_PREFIX	"c"
+#define TIME_RELATED_SQLVALUE_FUNCTION(s)	\
+		(s->op == SVFOP_CURRENT_TIMESTAMP || \
+		 s->op == SVFOP_CURRENT_TIMESTAMP_N || \
+		 s->op == SVFOP_CURRENT_TIME || \
+		 s->op == SVFOP_CURRENT_TIME_N || \
+		 s->op == SVFOP_LOCALTIMESTAMP || \
+		 s->op == SVFOP_LOCALTIMESTAMP_N || \
+		 s->op == SVFOP_LOCALTIME || \
+		 s->op == SVFOP_LOCALTIME_N)
 
 /*
  * Functions to determine whether an expression can be evaluated safely on
@@ -157,6 +166,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context);
 static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 	 deparse_expr_cxt *context);
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
+static void deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
 static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
@@ -273,7 +283,7 @@ is_foreign_expr(PlannerInfo *root,
 	 * be able to make this choice with more granularity.  (We check this last
 	 * because it requires a lot of expensive catalog lookups.)
 	 */
-	if (contain_mutable_functions((Node *) expr))
+	if (contain_unsafe_functions((Node *) expr))
 		return false;
 
 	/* OK to evaluate on the remote server */
@@ -618,6 +628,23 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_SQLValueFunction:
+			{
+SQLValueFunction *s = (SQLValueFunction *) node;
+
+/*
+ * For now only time-related SQLValue functions are supported.
+ * We can push down localtimestamp and localtime as we
+ * compute them locally.
+ */
+if (!TIME_RELATED_SQLVALUE_FUNCTION(s))
+	return false;
+
+/* Timestamp or time are not collatable */
+collation = InvalidOid;
+state = FDW_COLLATE_NONE;
+			}
+			break;
 		case T_BoolExpr:
 			{
 BoolExpr   *b = (BoolExpr *) node;
@@ -1031,6 +1058,14 @@ is_foreign_param(PlannerInfo *root,
 		case T_Param:
 			/* Params always have to be sent to the foreign server */
 			return true;
+		case T_SQLValueFunction:
+			{
+SQLValueFunction *s = (SQLValueFunction *) expr;
+
+if (TIME_RELATED_SQLVALUE_FUNCTION(s))
+	return true;
+break;
+			}
 		default:
 			break;
 	}
@@ -2603,6 +2638,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
 		case T_RelabelType:
 			deparseRelabelType((RelabelType *) node, context);
 			break;
+		case T_SQLValueFunction:
+			deparseSQLValueFunction((SQLValueFunction *) node, context);
+			break;
 		case T_BoolExpr:
 			deparseBoolExpr((BoolExpr *) node, context);
 			break;
@@ -3092,6 +3130,61 @@ deparseRelabelType(RelabelType *node, deparse_expr_cxt *context)
 		   node->resulttypmod));
 }
 
+/*
+ * Deparse a SQLValueFunction node
+ */
+static void
+deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context)
+{
+	int32		typmod = -1;
+
+	switch (node->op)
+	{
+		case SVFOP_LOCALTIME:
+		case SVFOP_CURRENT_TIME:
+		case SVFOP_LOCALTIMESTAMP:
+		cas

Re: Push down time-related SQLValue functions to foreign server

2021-08-19 Thread Alexander Pyhalov


Zhihong Yu писал 2021-08-19 13:22:


Hi,
For 0002 patch:

+   /* now() is stable, but we can ship it as it's replaced by
parameter */
+   return !(func_volatile(func_id) == PROVOLATILE_IMMUTABLE ||
func_id == F_NOW);

Did you mean to say 'now() is unstable' ?


No, it's stable, not immutable, so we need additional check.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Push down time-related SQLValue functions to foreign server

2021-08-19 Thread Alexander Pyhalov


Hi.

The attached patches allow pushing down 
current_timestamp/localtimestamp/current_time/localtime and now() to 
remote PostgreSQL server as locally computed parameters.

The idea is based on oracle_fdw behavior.

Examples.

\d test
  Foreign table "public.test"
 Column |   Type   | Collation | Nullable | Default |
FDW options

+--+---+--+-+---
 i  | integer  |   |  | | 
(column_name 'i')
 t  | timestamp with time zone |   |  | | 
(column_name 't')

Server: loopback
FDW options: (schema_name 'data', table_name 'test')

Prior the patch:

explain verbose select * from test where t=current_timestamp;
 QUERY PLAN
-
 Foreign Scan on public.test  (cost=100.00..188.12 rows=11 width=12)
   Output: i, t
   Filter: (test.t = CURRENT_TIMESTAMP)
   Remote SQL: SELECT i, t FROM data.test

explain verbose update test set t=current_timestamp where t   ->  Foreign Scan on public.test  (cost=100.00..154.47 rows=414 
width=50)

 Output: CURRENT_TIMESTAMP, ctid, test.*
 Filter: (test.t < now())
 Remote SQL: SELECT i, t, ctid FROM data.test FOR UPDATE


After patch:
explain verbose select * from test where t=current_timestamp;
 QUERY PLAN
-
 Foreign Scan on public.test  (cost=100.00..144.35 rows=11 width=12)
   Output: i, t
   Remote SQL: SELECT i, t FROM data.test WHERE ((t = $1::timestamp with 
time zone))


explain verbose update test set t=current_timestamp where t   ->  Foreign Update on public.test  (cost=100.00..137.93 rows=414 
width=50)
 Remote SQL: UPDATE data.test SET t = $1::timestamp with time 
zone WHERE ((t < $1::timestamp with time zone))


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 0846ba1d3a5f15bbea449b39741f08558fdb2d49 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 29 Jul 2021 11:45:28 +0300
Subject: [PATCH 1/2] SQLValue functions pushdown

current_timestamp, localtimestamp and similar SQLValue functions
can be computed locally and sent to remote side as parameters values.
---
 contrib/postgres_fdw/deparse.c| 100 +++-
 .../postgres_fdw/expected/postgres_fdw.out| 108 ++
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/shippable.c  |  68 +++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  29 +
 5 files changed, 305 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd666818..01748835fc8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -157,6 +157,7 @@ static void deparseDistinctExpr(DistinctExpr *node, deparse_expr_cxt *context);
 static void deparseScalarArrayOpExpr(ScalarArrayOpExpr *node,
 	 deparse_expr_cxt *context);
 static void deparseRelabelType(RelabelType *node, deparse_expr_cxt *context);
+static void deparseSQLValueFunction(SQLValueFunction *node, deparse_expr_cxt *context);
 static void deparseBoolExpr(BoolExpr *node, deparse_expr_cxt *context);
 static void deparseNullTest(NullTest *node, deparse_expr_cxt *context);
 static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
@@ -273,7 +274,7 @@ is_foreign_expr(PlannerInfo *root,
 	 * be able to make this choice with more granularity.  (We check this last
 	 * because it requires a lot of expensive catalog lookups.)
 	 */
-	if (contain_mutable_functions((Node *) expr))
+	if (contain_unsafe_functions((Node *) expr))
 		return false;
 
 	/* OK to evaluate on the remote server */
@@ -618,6 +619,30 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_SQLValueFunction:
+			{
+SQLValueFunction *s = (SQLValueFunction *) node;
+
+/*
+ * For now only time-related SQLValue functions are supported.
+ * We can push down localtimestamp and localtime as we
+ * compute them locally.
+ */
+if ((s->op != SVFOP_CURRENT_TIMESTAMP) &&
+	(s->op != SVFOP_CURRENT_TIMESTAMP_N) &&
+	(s->op != SVFOP_CURRENT_TIME) &&
+	(s->op != SVFOP_CURRENT_TIME_N) &&
+	(s->op != SVFOP_LOCALTIMESTAMP) &&
+	(s->op != SVFOP_LOCALTIMESTAMP_N) &&
+	(s->op != SVFOP_LOCALTIME) &&
+	(s->op != SVFOP_LOCALTIME_N))
+	return false;
+
+/* Timestamp or time are not collatable */
+collation = InvalidOid;
+state = FDW_COLLATE_NONE;
+			}
+			break;
 		case T_BoolExpr:
 			{
 BoolExpr   *b = (BoolExpr *) node;
@@ -1031,6 +1056,21 @@ is_foreign_param(PlannerInfo *root,
 		case T_Param:

Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?

2021-08-16 Thread Alexander Pyhalov


Hi.

I'm currently looking on pushing down SQLValue expressions to foreign 
servers and was surprised that two timestamptz-related functions are not 
immutable.

I see that this was changed in commit

commit 1ab415596d1de61561d0de8fe9da4aea207adca4
Author: Tom Lane 
Date:   Mon Oct 4 22:13:14 2004 +

Correct the volatility labeling of ten timestamp-related functions,
per discussion from Friday.  initdb not forced in this commit but I 
intend

to do that later.

I'm not sure, why  timestamptz_pl_interval and timestamptz_mi_interval 
are not immutable. Even if we change timezone during transaction, 
addition of the same interval to the same timestamps with time zone 
gives the same result.


postgres=# begin ;
BEGIN
postgres=*# select current_timestamp;
   current_timestamp
---
 2021-08-16 13:26:59.366452+03
(1 row)

postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03';
  timestamptz
---
 2021-08-16 13:26:59.366452+03
(1 row)

postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 
days'::interval;

   ?column?
---
 2021-08-18 13:26:59.366452+03
(1 row)

postgres=*# set timezone to UTC;
SET
postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 
days'::interval;

   ?column?
---
 2021-08-18 10:26:59.366452+00
(1 row)

postgres=*# select timestamptz '2021-08-18 13:26:59.366452+03' = 
timestamptz '2021-08-18 10:26:59.366452+00';

 ?column?
--
 t
(1 row)

What do I miss?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Case expression pushdown

2021-07-30 Thread Alexander Pyhalov


Tom Lane писал 2021-07-29 23:54:

Alexander Pyhalov  writes:

[ 0001-Allow-pushing-CASE-expression-to-foreign-server-v7.patch ]


I looked this over.  It's better than before, but the collation
handling is still not at all correct.  We have to consider that a
CASE's arg expression supplies the collation for a contained
CaseTestExpr, otherwise we'll come to the wrong conclusions about
whether "CASE foreignvar WHEN ..." is shippable, if the foreignvar
is what's determining collation of the comparisons.

This means that the CaseExpr level of recursion has to pass data down
to the CaseTestExpr level.  In the attached, I did that by adding an
additional argument to foreign_expr_walker().  That's a bit invasive,
but it's not awful.  I thought about instead adding fields to the
foreign_loc_cxt struct.  But that seemed considerably messier in the
end, because we'd then have some fields that are information sourced
at one recursion level and some that are info sourced at another
level.

I also whacked the regression test cases around a lot.  They seemed
to spend a lot of time on irrelevant combinations, while failing to
check the things that matter, namely whether collation-based pushdown
decisions are made correctly.

regards, tom lane


Hi.

Overall looks good.
The only thing I'm confused about is in T_CaseTestExpr case - how can it 
be that CaseTestExpr collation doesn't match case_arg_cxt->collation ?
Do we we need to inspect only case_arg_cxt->state? Can we assert that 
collation == case_arg_cxt->collation?


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Case expression pushdown

2021-07-26 Thread Alexander Pyhalov


Tom Lane писал 2021-07-26 18:18:

Alexander Pyhalov  writes:

[ 0001-Allow-pushing-CASE-expression-to-foreign-server-v6.patch ]


This doesn't compile cleanly:

deparse.c: In function 'foreign_expr_walker.isra.4':
deparse.c:920:8: warning: 'collation' may be used uninitialized in
this function [-Wmaybe-uninitialized]
 if (collation != outer_cxt->collation)
^
deparse.c:914:3: warning: 'state' may be used uninitialized in this
function [-Wmaybe-uninitialized]
   switch (state)
   ^~

These uninitialized variables very likely explain the fact that it 
fails
regression tests, both for me and for the cfbot.  Even if this weren't 
an

outright bug, we don't tolerate code that produces warnings on common
compilers.

regards, tom lane


Hi.

Of course, this is a patch issue. Don't understand how I overlooked 
this.
Rebased on master and fixed it. Tests are passing here (but they also 
passed for previous patch version).


What exact tests are failing?

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 9c9fa2e37fc62ddcd8dc6176306d74b7e219fd26 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 22 Jul 2021 11:42:16 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 150 ++
 .../postgres_fdw/expected/postgres_fdw.out| 184 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  63 ++
 3 files changed, 397 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..df1aaf8e713 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -627,6 +629,105 @@ foreign_expr_walker(Node *node,
 state = FDW_COLLATE_NONE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+ListCell   *lc;
+foreign_loc_cxt tmp_cxt;
+CaseExpr   *ce = (CaseExpr *) node;
+
+/*
+ * case arg expression collation doesn't affect result
+ * collation
+ */
+tmp_cxt.collation = InvalidOid;
+tmp_cxt.state = FDW_COLLATE_NONE;
+if (ce->arg && !foreign_expr_walker((Node *) ce->arg, glob_cxt, _cxt))
+	return false;
+
+/* Recurse to case clause subexpressions. */
+foreach(lc, ce->args)
+{
+	CaseWhen   *cw = lfirst_node(CaseWhen, lc);
+	Node	   *whenExpr = (Node *) cw->expr;
+
+	/*
+	 * The parser should have produced WHEN clauses of the
+	 * form "CaseTestExpr = RHS", possibly with an implicit
+	 * coercion inserted above the CaseTestExpr. However in an
+	 * expression that's been through the optimizer, the WHEN
+	 * clause could be almost anything (since the equality
+	 * operator could have been expanded into an inline
+	 * function). In this case forbid pushdown.
+	 */
+
+	if (ce->arg)
+	{
+		List	   *whenExprArgs;
+
+		if (!IsA(whenExpr, OpExpr))
+			return false;
+
+		whenExprArgs = ((OpExpr *) whenExpr)->args;
+
+		if ((list_length(whenExprArgs) != 2) ||
+			!IsA(strip_implicit_coercions(linitial(whenExprArgs)), CaseTestExpr))
+			return false;
+	}
+
+	/*
+	 * case when expression collation doesn't affect result
+	 * collation
+	 */
+	tmp_cxt.collation = InvalidOid;
+	tmp_cxt.state = FDW_COLLATE_NONE;
+	/* Recurse to case clause expression. */
+	if (!foreign_expr_walker((Node *) cw->expr,
+			 glob_cxt, _cxt))
+		return false;
+
+	/* Recurse to result expression. */
+	if (!foreign_expr_walker((Node *) cw->result,
+			 glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult, glob_cxt, _cxt))
+	return false;
+
+/*
+ * Collation rule is same as for function nodes.
+ */
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+CaseTestExpr *c = (CaseTestExpr *) node;
+
+/*
+ * Collation rule is same as for function nodes.
+ */
+collation = c->collation;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else

Re: Case expression pushdown

2021-07-22 Thread Alexander Pyhalov


Tom Lane писал 2021-07-21 19:49:

Gilles Darold  writes:

I'm attaching the v5 patch again as it doesn't appears in the Latest
attachment list in the commitfest.


So this has a few issues:


Hi.



1. In foreign_expr_walker, you're failing to recurse to either the
"arg" or "defresult" subtrees of a CaseExpr, so that it would fail
to notice unshippable constructs within those.


Fixed this.



2. You're also failing to guard against the hazard that a WHEN
expression within a CASE-with-arg has been expanded into something
that doesn't look like "CaseTestExpr = something".  As written,
this patch would likely dump core in that situation, and if it didn't
it would send nonsense to the remote server.  Take a look at the
check for that situation in ruleutils.c (starting at line 8764
as of HEAD) and adapt it to this.  Probably what you want is to
just deem the CASE un-pushable if it's been modified away from that
structure.  This is enough of a corner case that optimizing it
isn't worth a great deal of trouble ... but crashing is not ok.



I think I fixed this by copying check from ruleutils.c.



3. A potentially uncomfortable issue for the CASE-with-arg syntax
is that the specific equality operator being used appears nowhere
in the decompiled expression, thus raising the question of whether
the remote server will interpret it the same way we did.  Given
that we restrict the values-to-be-compared to be of shippable
types, maybe this is safe in practice, but I have a bad feeling
about it.  I wonder if we'd be better off just refusing to ship
CASE-with-arg at all, which would a-fortiori avoid point 2.


I'm not shure how 'case a when b ...' is different from 'case when a=b 
...'

in this case. If type of a or b is not shippable, we will not push down
this expression in any way. And if they are of builtin types, why do
these expressions differ?



4. I'm not sure that I believe any part of the collation handling.
There is the question of what collations will be used for the
individual WHEN comparisons, which can probably be left for
the recursive checks of the CaseWhen.expr subtrees to handle;
and then there is the separate issue of whether the CASE's result
collation (which arises from the CaseWhen.result exprs plus the
CaseExpr.defresult expr) can be deemed to be safely derived from
remote Vars.  I haven't totally thought through how that should
work, but I'm pretty certain that handling the CaseWhen's within
separate recursive invocations of foreign_expr_walker cannot
possibly get it right.  However, you'll likely have to flatten
those anyway (i.e., handle them within the loop in the CaseExpr
case) while fixing point 2.


I've tried to account for a fact that we are interested only in
caseWhen->result collations, but still not sure that I'm right here.



5. This is a cosmetic point, but: the locations of the various
additions in deparse.c seem to have been chosen with the aid
of a dartboard.  We do have a convention for this sort of thing,
which is to lay out code concerned with different node types
in the same order that the node types are declared in *nodes.h.
I'm not sufficiently anal to want to fix the existing violations
of that rule that I see in deparse.c; but the fact that somebody
got this wrong before isn't license to make things worse.

regards, tom lane


Fixed this.

Thanks for review.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f323ce6e6e12004ee448bbd6721c396826bd9eeb Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 22 Jul 2021 11:42:16 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 136 +
 .../postgres_fdw/expected/postgres_fdw.out| 184 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  63 ++
 3 files changed, 383 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..f0cbd958890 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -627,6 +629,91 @@ foreign_expr_walker(Node *node,
 state = FDW_COLLATE_NONE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+ListCell   *lc;
+foreign_loc_cxt tmp_cxt;
+CaseExpr   *ce = (CaseExpr *) node;
+
+/*
+ * case arg expression collation doesn't affect result
+ * collation
+ */
+tmp_cxt.collation = InvalidOid;
+tmp_cxt.state = FDW_COLLATE_NONE;
+if (ce->arg && !foreign_expr_walker((Node *) ce->arg, glob_cxt, _cxt))
+	return false;
+
+/* Recurse to case clause subexpressions. */
+foreach(lc, ce-&

Re: Case expression pushdown

2021-07-07 Thread Alexander Pyhalov


Hi.

Gilles Darold писал 2021-07-07 15:02:


Le 22/06/2021 à 15:39, Alexander Pyhalov a écrit :


Seino Yuki писал 2021-06-22 16:03:
On 2021-06-16 01:29, Alexander Pyhalov wrote:
Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next
commitfest?

Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2  (cost=156.58..165.45 rows=394
width=14)

 It's not a big issue, but is there any intention behind the pattern
of
outputting costs in regression tests?

Hi.

No, I don't think it makes much sense. Updated tests (also added case
with empty else).

The patch doesn't apply anymore to master, I join an update of your
patch update in attachment. This is your patch rebased and untouched
minus a comment in the test and renamed to v4.

I could have miss something but I don't think that additional struct
elements case_args in structs foreign_loc_cxt and deparse_expr_cxt are
necessary. They look to be useless.


I thought we should compare arg collation and expression collation and 
didn't suggest, that we can take CaseTestExpr's collation directly, 
without deriving it from CaseExpr's arg. Your version of course looks 
saner.




The patch will also be more clear if the CaseWhen node was handled
separately in foreign_expr_walker() instead of being handled in the
T_CaseExpr case. By this way the T_CaseExpr case just need to call
recursively foreign_expr_walker(). I also think that code in
T_CaseTestExpr should just check the collation, there is nothing more
to do here like you have commented the function deparseCaseTestExpr().
This function can be removed as it does nothing if the case_args
elements are removed.

There is a problem the regression test with nested CASE clauses:


EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1,c2,c3 FROM ft2 WHERE CASE CASE WHEN c2 > 0 THEN c2 END
WHEN 100 THEN 601 WHEN c2 THEN c2 ELSE 0 END > 600 ORDER BY c1;


the original query use "WHERE CASE CASE WHEN" but the remote query is
not the same in the plan:


Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" WHERE (((CASE WHEN
((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = 100) THEN 601
WHEN ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = c2) THEN
c2 ELSE 0 END) > 600)) ORDER BY "C 1" ASC NULLS LAST


Here this is "WHERE (((CASE WHEN ((CASE WHEN" I expected it to be
unchanged to "WHERE (((CASE (CASE WHEN".


I'm not sure this is an issue (as we change CASE A WHEN B ... to CASE 
WHEN (A=B)),

and expressions should be free from side effects, but again your version
looks better.

Thanks for improving the patch, it looks saner now.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

PostgreSQL 14 backend crash on incorrect trigger

2021-07-07 Thread Alexander Pyhalov

(plan=0x563f05fc8100, 
sourceText=0x563f05ad74a0 "DELETE FROM test WHERE i=1;", params=0x0, 
queryEnv=0x0, dest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at pquery.c:190
#29 0x563f037478fd in PortalRunMulti (portal=0x563f05b3aa50, 
isTopLevel=true, setHoldSnapshot=false, dest=0x563f05fc81f0, 
altdest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at pquery.c:1266
#30 0x563f03746e24 in PortalRun (portal=0x563f05b3aa50, 
count=9223372036854775807, isTopLevel=true, run_once=true, 
dest=0x563f05fc81f0, altdest=0x563f05fc81f0, qc=0x7ffcab5c6cf0) at 
pquery.c:786
#31 0x563f03740084 in exec_simple_query (query_string=0x563f05ad74a0 
"DELETE FROM test WHERE i=1;") at postgres.c:1214
#32 0x563f03744c41 in PostgresMain (argc=1, argv=0x7ffcab5c6f10, 
dbname=0x563f05b02938 "contrib_regression", username=0x563f05b02918 
"leoric") at postgres.c:4486
#33 0x563f03670f3a in BackendRun (port=0x563f05af8f00) at 
postmaster.c:4507
#34 0x563f036707f3 in BackendStartup (port=0x563f05af8f00) at 
postmaster.c:4229

#35 0x563f0366c97c in ServerLoop () at postmaster.c:1745
#36 0x563f0366c115 in PostmasterMain (argc=8, argv=0x563f05ad1820) 
at postmaster.c:1417
#37 0x563f0356193d in main (argc=8, argv=0x563f05ad1820) at 
main.c:209

(gdb) print *subplan
$2 = {type = T_Result, startup_cost = 0, total_cost = 0, plan_rows = 0, 
plan_width = 0, parallel_aware = false, parallel_safe = false, 
async_capable = false, plan_node_id = 0, targetlist = 0x563f06020d40, 
qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0,

  extParam = 0x0, allParam = 0x0}


--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 911f171d812..7d5051406a2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3355,3 +3355,31 @@ CREATE FOREIGN TABLE inv_fsz (c1 int )
 -- Invalid batch_size option
 CREATE FOREIGN TABLE inv_bsz (c1 int )
 	SERVER loopback OPTIONS (batch_size '100$%$#$#');
+
+CREATE TABLE test (i int, j int);
+CREATE TABLE test_base (i int, j int);
+INSERT INTO test VALUES (1,2),(3,4);
+INSERT INTO test_base VALUES (1,2),(3,4);
+CREATE FOREIGN TABLE test_remote (i int, j int)
+   SERVER loopback OPTIONS (table_name 'test_base');
+CREATE FUNCTION update_test()
+RETURNS trigger
+ LANGUAGE plpgsql
+AS $function$
+DECLARE
+  rec record;
+  tname text;
+BEGIN
+  IF (TG_OP = 'DELETE') THEN
+  -- reference to new instead of old
+DELETE FROM test_remote WHERE row(i,j)=row(new.i,new.j);
+  ELSIF (TG_OP = 'INSERT') THEN
+INSERT INTO test_remote VALUES(new.i, new.j);
+  ELSIF (TG_OP = 'UPDATE') THEN
+UPDATE test_remote SET i=new.i, j=new.i WHERE i=old.i AND j=old.j;
+  END IF;
+  RETURN NULL;
+  END;
+$function$;
+CREATE TRIGGER tg_upd AFTER INSERT OR DELETE OR UPDATE ON public.test FOR EACH ROW EXECUTE FUNCTION update_test();
+DELETE FROM test WHERE i=1;

Re: Asymmetric partition-wise JOIN

2021-07-06 Thread Alexander Pyhalov


Andrey Lepikhov писал 2021-07-06 12:28:

On 5/7/21 23:15, Zhihong Yu wrote:
On Mon, Jul 5, 2021 at 2:57 AM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
+            * Can't imagine situation when join relation already 
exists. But in

+            * the 'partition_join' regression test it happens.
+            * It may be an indicator of possible problems.
Should a log be added in the above case ?

I made additional analysis of this branch of code. This situation can
happen in the case of one child or if we join two plane tables with
partitioned. Both situations are legal and I think we don't needed to
add any log message here.
Other mistakes were fixed.


Hi.

Small  typo in comment in src/backend/optimizer/plan/setrefs.c:

 281
 282 /*
 283  * Adjust RT indexes of AppendRelInfos and add to final 
appendrels list.
 284  * The AppendRelInfos are copied, because as a part of a 
subplan its could

 285  * be visited many times in the case of asymmetric join.
 286  */
 287 foreach(lc, root->append_rel_list)
 288 {

its  -> it (or they) ?
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partitioned index can be not dumped

2021-06-30 Thread Alexander Pyhalov


Álvaro Herrera писал 2021-06-30 21:54:

On 2021-Jun-30, Alexander Pyhalov wrote:


I've seen the following effect on PostgreSQL 14 stable branch.
Index, created on partitioned table, disappears from pg_dump or psql 
\d

output.
This seems to begin after analyze. Partitoned relation relhasindex 
pg_class

field suddenly becomes false.


Yeah, that seems correct.

I didn't verify your test case, but after looking at the code I thought
there was a bit too much churn and the new conditions looked quite 
messy

and unexplained.  It seems simpler to be explicit at the start about
what we're doing, and keep nindexes=0 for partitioned tables; with 
that,

the code works unchanged because the "for" loops do nothing without
having to check for anything.  My proposal is attached.

I did run the tests and they do pass, but I didn't look very closely at
what the tests are actually doing.

I noticed that part of that comment seems to be a leftover from ... I
don't know when: "We do not analyze index columns if there was an
explicit column list in the ANALYZE command, however."  I suppose this
is about some code that was removed, but I didn't dig into it.


Looks good. It seems this comment refers to line 455.

 445 if (nindexes > 0)
 446 {
 447 indexdata = (AnlIndexData *) palloc0(nindexes * 
sizeof(AnlIndexData));

 448 for (ind = 0; ind < nindexes; ind++)
 449 {
 450 AnlIndexData *thisdata = [ind];
 451 IndexInfo  *indexInfo;
 452
 453 thisdata->indexInfo = indexInfo = 
BuildIndexInfo(Irel[ind]);

 454 thisdata->tupleFract = 1.0; /* fix later if partial */
 455 if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
 456 {
 457 ListCell   *indexpr_item = 
list_head(indexInfo->ii_Expressions);

 458
 459 thisdata->vacattrstats = (VacAttrStats **)
 460 palloc(indexInfo->ii_NumIndexAttrs * 
sizeof(VacAttrStats *));


Also I've added non-necessary new line in test.
Restored comment and removed new line.
--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 0677fca372349ab2a1f5bc2e69a91c72ae7dfaa3 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Wed, 30 Jun 2021 22:24:56 +0300
Subject: [PATCH] Set relhasindex for partitioned tables correctly.

The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe:
in this commit we unconditionally set nindexes to 0 for partitioned
relations.
---
 src/backend/commands/analyze.c   | 24 +---
 src/test/regress/expected/vacuum.out | 17 +
 src/test/regress/sql/vacuum.sql  | 11 +++
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 426c1e67109..c21cb7da3cf 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -422,18 +422,28 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 * columns in the indexes.  We do not analyze index columns if there was
 	 * an explicit column list in the ANALYZE command, however.  If we are
 	 * doing a recursive scan, we don't want to touch the parent's indexes at
-	 * all.
+	 * all.  If we're processing a partitioned table, we need to know if there
+	 * are any indexes, but we don't want to process them.
 	 */
-	if (!inh)
+	if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
 		vac_open_indexes(onerel, AccessShareLock, , );
+		hasindex = nindexes > 0;
+		if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		{
+			vac_close_indexes(nindexes, Irel, NoLock);
+			nindexes = 0;
+			Irel = NULL;
+		}
+	}
 	else
 	{
 		Irel = NULL;
 		nindexes = 0;
+		hasindex = false;
 	}
-	hasindex = (nindexes > 0);
 	indexdata = NULL;
-	if (hasindex)
+	if (nindexes > 0)
 	{
 		indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData));
 		for (ind = 0; ind < nindexes; ind++)
@@ -572,7 +582,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			MemoryContextResetAndDeleteChildren(col_context);
 		}
 
-		if (hasindex)
+		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
 indexdata, nindexes,
 rows, numrows,
@@ -660,10 +670,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		/*
 		 * Partitioned tables don't have storage, so we don't set any fields
 		 * in their pg_class entries except for reltuples, which is necessary
-		 * for auto-analyze to work properly.
+		 * for auto-analyze to work properly, and relhasindex.
 		 */
 		vac_update_relstats(onerel, -1, totalrows,
-			0, false, InvalidTransactionId,
+			0, hasindex, InvalidTransactionId,
 			InvalidMultiXactId,
 			in_outer_xact);
 	}
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index e5771462d57..e3d462b66fa 100644
--- a/src/test/regress/expected/vacuum.out

Re: Partitioned index can be not dumped

2021-06-30 Thread Alexander Pyhalov


Alexander Pyhalov писал 2021-06-30 17:26:

Hi.




Sorry, test had an issue.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 2aabf5e8e86d222e6a73b25ccc652fe645e12fc4 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Wed, 30 Jun 2021 17:22:37 +0300
Subject: [PATCH] Set relhasindex for partitioned tables correctly.

The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe:
in this commit we unconditionally set  nindexes to 0 for partitioned
relations.
---
 src/backend/commands/analyze.c   | 53 
 src/test/regress/expected/vacuum.out | 17 +
 src/test/regress/sql/vacuum.sql  | 12 +++
 3 files changed, 59 insertions(+), 23 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 426c1e67109..9c8913a1619 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -424,8 +424,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 * doing a recursive scan, we don't want to touch the parent's indexes at
 	 * all.
 	 */
-	if (!inh)
+	if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
 		vac_open_indexes(onerel, AccessShareLock, , );
+	}
 	else
 	{
 		Irel = NULL;
@@ -433,7 +435,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	}
 	hasindex = (nindexes > 0);
 	indexdata = NULL;
-	if (hasindex)
+	if (hasindex && !inh)
 	{
 		indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData));
 		for (ind = 0; ind < nindexes; ind++)
@@ -487,14 +489,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		if (targrows < vacattrstats[i]->minrows)
 			targrows = vacattrstats[i]->minrows;
 	}
-	for (ind = 0; ind < nindexes; ind++)
+	if (!inh)
 	{
-		AnlIndexData *thisdata = [ind];
-
-		for (i = 0; i < thisdata->attr_cnt; i++)
+		for (ind = 0; ind < nindexes; ind++)
 		{
-			if (targrows < thisdata->vacattrstats[i]->minrows)
-targrows = thisdata->vacattrstats[i]->minrows;
+			AnlIndexData *thisdata = [ind];
+
+			for (i = 0; i < thisdata->attr_cnt; i++)
+			{
+if (targrows < thisdata->vacattrstats[i]->minrows)
+	targrows = thisdata->vacattrstats[i]->minrows;
+			}
 		}
 	}
 
@@ -572,7 +577,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			MemoryContextResetAndDeleteChildren(col_context);
 		}
 
-		if (hasindex)
+		if (hasindex & !inh)
 			compute_index_stats(onerel, totalrows,
 indexdata, nindexes,
 rows, numrows,
@@ -589,23 +594,25 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		update_attstats(RelationGetRelid(onerel), inh,
 		attr_cnt, vacattrstats);
 
-		for (ind = 0; ind < nindexes; ind++)
+		if (!inh)
 		{
-			AnlIndexData *thisdata = [ind];
+			for (ind = 0; ind < nindexes; ind++)
+			{
+AnlIndexData *thisdata = [ind];
 
-			update_attstats(RelationGetRelid(Irel[ind]), false,
-			thisdata->attr_cnt, thisdata->vacattrstats);
-		}
+update_attstats(RelationGetRelid(Irel[ind]), false,
+thisdata->attr_cnt, thisdata->vacattrstats);
+			}
 
-		/*
-		 * Build extended statistics (if there are any).
-		 *
-		 * For now we only build extended statistics on individual relations,
-		 * not for relations representing inheritance trees.
-		 */
-		if (!inh)
+			/*
+			 * Build extended statistics (if there are any).
+			 *
+			 * For now we only build extended statistics on individual
+			 * relations, not for relations representing inheritance trees.
+			 */
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 	   attr_cnt, vacattrstats);
+		}
 	}
 
 	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
@@ -663,7 +670,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * for auto-analyze to work properly.
 		 */
 		vac_update_relstats(onerel, -1, totalrows,
-			0, false, InvalidTransactionId,
+			0, hasindex, InvalidTransactionId,
 			InvalidMultiXactId,
 			in_outer_xact);
 	}
@@ -704,7 +711,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 * amvacuumcleanup() when called in ANALYZE-only mode.  The only exception
 	 * among core index AMs is GIN/ginvacuumcleanup().
 	 */
-	if (!(params->options & VACOPT_VACUUM))
+	if (!(params->options & VACOPT_VACUUM) && !inh)
 	{
 		for (ind = 0; ind < nindexes; ind++)
 		{
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index e5771462d57..e3d462b66fa 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -199,6 +199,23 @@ VACUUM ANALYZE vacparted(a,b,a);
 ERROR:  column "a" of relation "vacparted" appears more than once
 ANALYZE vacparted(a,b,b);
 ERROR:  column "b" of relation "vacparted" appears more than once
+-- partitioned table with index
+CREATE TABLE vacparted_i (a int primary key, b varchar(100)) PARTITION BY HAS

Partitioned index can be not dumped

2021-06-30 Thread Alexander Pyhalov


Hi.

I've seen the following effect on PostgreSQL 14 stable branch.
Index, created on partitioned table, disappears from pg_dump or psql \d 
output.
This seems to begin after analyze. Partitoned relation relhasindex 
pg_class field suddenly becomes false.


The issue happens after

commit 0e69f705cc1a3df273b38c9883fb5765991e04fe (HEAD, refs/bisect/bad)
Author: Alvaro Herrera 
Date:   Fri Apr 9 11:29:08 2021 -0400

Set pg_class.reltuples for partitioned tables

When commit 0827e8af70f4 added auto-analyze support for partitioned
tables, it included code to obtain reltuples for the partitioned 
table

as a number of catalog accesses to read pg_class.reltuples for each
partition.  That's not only very inefficient, but also problematic
because autovacuum doesn't hold any locks on any of those tables -- 
and
doesn't want to.  Replace that code with a read of 
pg_class.reltuples
for the partitioned table, and make sure ANALYZE and TRUNCATE 
properly

maintain that value.

I found no code that would be affected by the change of relpages 
from
zero to non-zero for partitioned tables, and no other code that 
should

be maintaining it, but if there is, hopefully it'll be an easy fix.

Per buildfarm.

Author: Álvaro Herrera 
Reviewed-by: Zhihong Yu 

It seems that in this commit we unconditionally overwrite this data with 
0.
I've tried to fix it by getting this information when inh is true and 
ignoring nindexes when inh is not true.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 0e66025a32c6e848b2b77355631b06b4b8d4dd08 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Wed, 30 Jun 2021 17:22:37 +0300
Subject: [PATCH] Set relhasindex for partitioned tables correctly.

The issue appeared after 0e69f705cc1a3df273b38c9883fb5765991e04fe:
in this commit we unconditionally set  nindexes to 0 for partitioned
relations.
---
 src/backend/commands/analyze.c   | 53 
 src/test/regress/expected/vacuum.out | 18 ++
 src/test/regress/sql/vacuum.sql  | 12 +++
 3 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 426c1e67109..9c8913a1619 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -424,8 +424,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 * doing a recursive scan, we don't want to touch the parent's indexes at
 	 * all.
 	 */
-	if (!inh)
+	if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
 		vac_open_indexes(onerel, AccessShareLock, , );
+	}
 	else
 	{
 		Irel = NULL;
@@ -433,7 +435,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	}
 	hasindex = (nindexes > 0);
 	indexdata = NULL;
-	if (hasindex)
+	if (hasindex && !inh)
 	{
 		indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData));
 		for (ind = 0; ind < nindexes; ind++)
@@ -487,14 +489,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		if (targrows < vacattrstats[i]->minrows)
 			targrows = vacattrstats[i]->minrows;
 	}
-	for (ind = 0; ind < nindexes; ind++)
+	if (!inh)
 	{
-		AnlIndexData *thisdata = [ind];
-
-		for (i = 0; i < thisdata->attr_cnt; i++)
+		for (ind = 0; ind < nindexes; ind++)
 		{
-			if (targrows < thisdata->vacattrstats[i]->minrows)
-targrows = thisdata->vacattrstats[i]->minrows;
+			AnlIndexData *thisdata = [ind];
+
+			for (i = 0; i < thisdata->attr_cnt; i++)
+			{
+if (targrows < thisdata->vacattrstats[i]->minrows)
+	targrows = thisdata->vacattrstats[i]->minrows;
+			}
 		}
 	}
 
@@ -572,7 +577,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			MemoryContextResetAndDeleteChildren(col_context);
 		}
 
-		if (hasindex)
+		if (hasindex & !inh)
 			compute_index_stats(onerel, totalrows,
 indexdata, nindexes,
 rows, numrows,
@@ -589,23 +594,25 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		update_attstats(RelationGetRelid(onerel), inh,
 		attr_cnt, vacattrstats);
 
-		for (ind = 0; ind < nindexes; ind++)
+		if (!inh)
 		{
-			AnlIndexData *thisdata = [ind];
+			for (ind = 0; ind < nindexes; ind++)
+			{
+AnlIndexData *thisdata = [ind];
 
-			update_attstats(RelationGetRelid(Irel[ind]), false,
-			thisdata->attr_cnt, thisdata->vacattrstats);
-		}
+update_attstats(RelationGetRelid(Irel[ind]), false,
+thisdata->attr_cnt, thisdata->vacattrstats);
+			}
 
-		/*
-		 * Build extended statistics (if there are any).
-		 *
-		 * For now we only build extended statistics on individual relations,
-		 * not for relations representing inheritance trees.
-		 */
-		if (!inh)
+			/*
+			 * Build extended statistics (if there are any).
+			 *
+			 * For now we only build extended statistics on individual
+			 * relations, not for relations representing inheritance trees.
+			 */

Re: Case expression pushdown

2021-06-22 Thread Alexander Pyhalov


Seino Yuki писал 2021-06-22 16:03:

On 2021-06-16 01:29, Alexander Pyhalov wrote:

Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next 
commitfest?




Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2  (cost=156.58..165.45 rows=394 width=14)

It's not a big issue, but is there any intention behind the pattern of
outputting costs in regression tests?


Hi.

No, I don't think it makes much sense. Updated tests (also added case 
with empty else).

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f6b57a3b84d1be0325321d4a0971f7b05b13a80a Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Tue, 30 Mar 2021 13:24:14 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 118 ++
 .../postgres_fdw/expected/postgres_fdw.out|  64 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  28 +
 3 files changed, 210 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..3621fed4b54 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	Expr	   *case_arg;		/* the last case arg to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	   *case_args;		/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_arg = NULL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_arg = outer_cxt->case_arg;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,62 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+	inner_cxt.case_arg = ce->arg;
+
+foreach(arg, ce->args)
+{
+	CaseWhen   *w = lfirst_node(CaseWhen, arg);
+
+	if (!foreign_expr_walker((Node *) w->expr,
+			 glob_cxt, _cxt))
+		return false;
+
+	if (!foreign_expr_walker((Node *) w->result,
+			 glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult,
+		 glob_cxt, _cxt))
+	return false;
+
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+Expr	   *arg;
+
+Assert(outer_cxt->case_arg != NULL);
+arg = outer_cxt->case_arg;
+
+if (!foreign_expr_walker((Node *) arg,
+		 glob_cxt, _cxt))
+	return false;
+
+/*
+ * Collation and state just bubble up from the previously
+ * saved case argument
+ */
+collation = inner_cxt.collation;
+state = inner_cxt.state;
+			}
+			break;
 		case T_OpExpr:
 		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
 			{
@@ -1019,6 +1082,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
 	context.foreignrel = rel;
 	context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel;
 	context.params_list = params_list;
+	context.case_args = NIL;
 
 	/* Construct SELECT clause */
 	deparseSelectSql(tlist, is_subquery, retrieved_attrs, );
@@ -1598,6 +1662,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 			context.scanrel = foreignrel;
 			context.root = root;
 			context.params_list = params_list;
+			context.case_args = NIL;
 
 			appendStringInfoChar(buf, '(');
 			appen

Re: Asymmetric partition-wise JOIN

2021-06-18 Thread Alexander Pyhalov

n, symmetric or 
asymmetric
1701  * partition-wise join. It is not correct right 
now, however, a hook
1702  * on add_path() to give additional decision for 
path removal allows
1703  * to retain this kind of AppendPath, regardless of 
its cost.

1704  */
1705 if (IsA(append_path, AppendPath))


What hook do you refer to?

src/backend/optimizer/plan/setrefs.c:

282 /*
283  * Adjust RT indexes of AppendRelInfos and add to final 
appendrels list.
284  * We assume the AppendRelInfos were built during planning 
and don't need

285  * to be copied.
286  */
287 foreach(lc, root->append_rel_list)
288 {
289 AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, 
lc);

290 AppendRelInfo *newappinfo;
291
292 /* flat copy is enough since all valuable fields are 
scalars */
293 newappinfo = (AppendRelInfo *) 
palloc(sizeof(AppendRelInfo));

294 memcpy(newappinfo, appinfo, sizeof(AppendRelInfo));

You've changed function to copy appinfo, so now comment is incorrect.

src/backend/optimizer/util/appendinfo.c:
588 /* Construct relids set for the immediate parent of the 
given child. */

589 normal_relids = bms_copy(child_relids);
590 for (cnt = 0; cnt < nappinfos; cnt++)
591 {
592 AppendRelInfo *appinfo = appinfos[cnt];
593
594 parent_relids = bms_add_member(parent_relids, 
appinfo->parent_relid);
595 normal_relids = bms_del_member(normal_relids, 
appinfo->child_relid);

596 }
597 parent_relids = bms_union(parent_relids, normal_relids);

Do I understand correctly that now parent_relids also contains relids of 
relations from 'global' inner relation, which we join to childs?



--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-16 Thread Alexander Pyhalov


Tomas Vondra писал 2021-06-12 00:01:

On 6/9/21 1:08 PM, Tomas Vondra wrote:



On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
 wrote:


Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).


I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?



Nope, that was causing infinite loop. This is jut a silly mistake on 
my
side - I forgot to replace the i/j variable inside the loop. Here's 
v3.


regards



FWIW I've pushed this, after improving the comments a little bit.


regards


Hi.
It seems this commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra 
Date:   Fri Jun 11 20:19:48 2021 +0200

Optimize creation of slots for FDW bulk inserts

has broken batch insert for partitions with unique indexes.

Earlier the case worked as expected, inserting 1000 tuples. Now it exits 
with


ERROR:  duplicate key value violates unique constraint "p0_pkey"
DETAIL:  Key (x)=(1) already exists.
CONTEXT:  remote SQL command: INSERT INTO public.batch_table_p0(x, 
field1, field2) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9), ($10, 
$11, $12), ($13, $14, $15), ($16, $17, $18), ($19, $20, $21), ($22, $23, 
$24), ($25, $26, $27), ($28, $29, $30), ($31, $32, $33), ($34, $35, 
$36), ($37, $38, $39), ($40, $41, $42), ($43, $44, $45), ($46, $47, 
$48), ($49, $50, $51), ($52, $53, $54), ($55, $56, $57), ($58, $59, 
$60), ($61, $62, $63), ($64, $65, $66), ($67, $68, $69), ($70, $71, 
$72), ($73, $74, $75), ($76, $77, $78), ($79, $80, $81), ($82, $83, 
$84), ($85, $86, $87), ($88, $89, $90), ($91, $92, $93), ($94, $95, 
$96), ($97, $98, $99), ($100, $101, $102), ($103, $104, $105), ($106, 
$107, $108), ($109, $110, $111), ($112, $113, $114), ($115, $116, $117), 
($118, $119, $120), ($121, $122, $123), ($124, $125, $126), ($127, $128, 
$129), ($130, $131, $132), ($133, $134, $135), ($136, $137, $138), 
($139, $140, $141), ($142, $143, $144), ($145, $146, $147), ($148, $149, 
$150), ($151, $152, $153), ($154, $155, $156), ($157, $158, $159), 
($160, $161, $162), ($163, $164, $165), ($166, $167, $168), ($169, $170, 
$171), ($172, $173, $174), ($175, $176, $177), ($178, $179, $180), 
($181, $182, $183), ($184, $185, $186), ($187, $188, $189), ($190, $191, 
$192), ($193, $194, $195), ($196, $197, $198), ($199, $200, $201), 
($202, $203, $204), ($205, $206, $207), ($208, $209, $210), ($211, $212, 
$213), ($214, $215, $216), ($217, $218, $219), ($220, $221, $222), 
($223, $224, $225), ($226, $227, $228), ($229, $230, $231), ($232, $233, 
$234), ($235, $236, $237), ($238, $239, $240), ($241, $242, $243), 
($244, $245, $246), ($247, $248, $249), ($250, $251, $252), ($253, $254, 
$255), ($256, $257, $258), ($259, $260, $261), ($262, $263, $264), 
($265, $266, $267), ($268, $269, $270), ($271, $272, $273), ($274, $275, 
$276), ($277, $278, $279), ($280, $281, $282), ($283, $284, $285), 
($286, $287, $288), ($289, $290, $291), ($292, $293, $294), ($295, $296, 
$297), ($298, $299, $300)



--
Best regards,
Alexander Pyhalov,
Postgres Professionaldiff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8cb2148f1f6..4c280f1e777 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3083,7 +3083,46 @@ UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+
+-- Use partitioning
+ALTER SERVER loopback OPTIONS (ADD batch_size '100');
+
+CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+ALTER TABLE batch_table_p0 ADD CONSTRAINT p0_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+ALTER TABLE batch_table_p1 ADD CONSTRAINT p1_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+ALTER TABLE batch_table_p2 ADD CONSTRAINT p2_pkey PRIMARY KEY (x);
+
+CREATE TABLE batch_table_p3 (LIKE batch_table);
+ALTER TABLE batch_table_p3 ADD CONSTRAINT p3_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p3f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p3');
+
+INSERT INTO batch_t

Re: Case expression pushdown

2021-06-15 Thread Alexander Pyhalov


Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next 
commitfest?




Addded to commitfest. Here is an updated patch version.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 80d60eb9b1630ee55d1825964e0e976ae6c289a1 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Tue, 30 Mar 2021 13:24:14 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 118 ++
 .../postgres_fdw/expected/postgres_fdw.out|  47 +++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  22 
 3 files changed, 187 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..3621fed4b54 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	Expr	   *case_arg;		/* the last case arg to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	   *case_args;		/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_arg = NULL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_arg = outer_cxt->case_arg;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,62 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+	inner_cxt.case_arg = ce->arg;
+
+foreach(arg, ce->args)
+{
+	CaseWhen   *w = lfirst_node(CaseWhen, arg);
+
+	if (!foreign_expr_walker((Node *) w->expr,
+			 glob_cxt, _cxt))
+		return false;
+
+	if (!foreign_expr_walker((Node *) w->result,
+			 glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult,
+		 glob_cxt, _cxt))
+	return false;
+
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+Expr	   *arg;
+
+Assert(outer_cxt->case_arg != NULL);
+arg = outer_cxt->case_arg;
+
+if (!foreign_expr_walker((Node *) arg,
+		 glob_cxt, _cxt))
+	return false;
+
+/*
+ * Collation and state just bubble up from the previously
+ * saved case argument
+ */
+collation = inner_cxt.collation;
+state = inner_cxt.state;
+			}
+			break;
 		case T_OpExpr:
 		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
 			{
@@ -1019,6 +1082,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
 	context.foreignrel = rel;
 	context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel;
 	context.params_list = params_list;
+	context.case_args = NIL;
 
 	/* Construct SELECT clause */
 	deparseSelectSql(tlist, is_subquery, retrieved_attrs, );
@@ -1598,6 +1662,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 			context.scanrel = foreignrel;
 			context.root = root;
 			context.params_list = params_list;
+			context.case_args = NIL;
 
 			appendStringInfoChar(buf, '(');
 			appendConditions(fpinfo->joinclauses, );
@@ -1901,6 +1966,7 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
 	context.scanrel = foreignrel;
 	context.buf = buf;
 	context.params_list = params_list;
+	context.case_args = NIL;
 
 	appendStringInfoString(buf, "UPDATE ");
 	deparseRelation(buf, rel);
@@ -2008,6 +2074,7 @@ deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 	context.scanrel = f

Case expression pushdown

2021-06-09 Thread Alexander Pyhalov


Hi.

This patch allows pushing case expressions to foreign servers, so that 
more types of updates could be executed directly.


For example, without patch:

EXPLAIN (VERBOSE, COSTS OFF)
UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END
WHERE c1 > 1000;
  QUERY PLAN
---
  Update on public.ft2 d
   Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2 WHERE ctid = $1
   ->  Foreign Scan on public.ft2 d
 Output: CASE WHEN (c2 > 0) THEN c2 ELSE 0 END, ctid, d.*
 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM 
"S 1"."T 1" WHERE (("C 1" > 1000)) FOR UPDATE



EXPLAIN (VERBOSE, COSTS OFF)
UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END
WHERE c1 > 1000;
   QUERY PLAN

 Update on public.ft2 d
   ->  Foreign Update on public.ft2 d
 Remote SQL: UPDATE "S 1"."T 1" SET c2 = (CASE WHEN (c2 > 0) 
THEN c2 ELSE 0 END) WHERE (("C 1" > 1000))



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 19202bfa5ba8a7eadf1a3b0ce869106e967e0dd2 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Tue, 30 Mar 2021 13:24:14 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 124 ++
 .../postgres_fdw/expected/postgres_fdw.out|  28 
 contrib/postgres_fdw/sql/postgres_fdw.sql |  16 +++
 3 files changed, 168 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..4e8162c045c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	List  *case_args;   /* list of case args to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	  *case_args;	/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_args = NIL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_args = outer_cxt->case_args;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,69 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+{
+	inner_cxt.case_args = lappend(inner_cxt.case_args, ce->arg);
+}
+
+foreach(arg, ce->args)
+{
+	CaseWhen   *w = lfirst_node(CaseWhen, arg);
+
+	if (!foreign_expr_walker((Node *) w->expr,
+		glob_cxt, _cxt))
+		return false;
+
+	if (!foreign_expr_walker((Node *) w->result,
+		glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult,
+		 glob_cxt, _cxt))
+	return false;
+
+if (ce->arg)
+{
+	inner_cxt.case_args = list_delete_last(inner_cxt.case_args);
+	outer_cxt->case_args = inner_cxt.case_args;
+}
+
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+Expr *arg;
+
+Assert(outer_cxt->case_args != NIL);
+arg = llast(outer_cxt->case_args);
+
+if (!foreign_expr_walker((Node *) arg,
+glob_cxt,

Re: join pushdown and issue with foreign update

2021-06-02 Thread Alexander Pyhalov


Tom Lane писал 2021-06-02 00:32:

I wrote:

I think a preferable fix involves making sure that the correct
record-type typmod is propagated to record_in in this context.
Alternatively, maybe we could insert the foreign table's rowtype
during execution of the input operation, without touching the
plan as such.


Here's a draft-quality patch based on that idea.  It resolves
the offered test case, but I haven't beat on it beyond that.

regards, tom lane


Hi.
The patch seems to work fine for mentioned case.
For now I'm working on function pushdown. When record-returning function 
(like unnest())
is pushed down, on this stage we've already lost any type information, 
so get the issue again.
So far I'm not sure how to fix the issue, perhaps just avoid pushing 
foreign join if we have

record, corresponding to function RTE var in joinrel->reltarget?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

1 2 >

1 - 100 of 107 matches

Mail list logo