subject:"Transactions involving multiple postgres foreign servers, take 2"

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-10 Thread Etsuro Fujita

Fujii-san,

On Thu, Oct 7, 2021 at 11:37 PM Fujii Masao  wrote:
> On 2021/10/07 19:47, Etsuro Fujita wrote:
> > On Thu, Oct 7, 2021 at 1:29 PM k.jami...@fujitsu.com
> >  wrote:
> >> and prove that commit performance can be improved,
> >> e.g. if we can commit not in serial but also possible in parallel.
> >
> > If it's ok with you, I'd like to work on the performance issue.  What
> > I have in mind is commit all remote transactions in parallel instead
> > of sequentially in the postgres_fdw transaction callback, as mentioned
> > above, but I think that would improve the performance even for
> > one-phase commit that we already have.
>
> +100

I’ve started working on this.  Once I have a (POC) patch, I’ll post it
in a new thread, as I think it can be discussed separately.

Thanks!

Best regards,
Etsuro Fujita

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-07 Thread Fujii Masao





On 2021/10/07 19:47, Etsuro Fujita wrote:

Hi,

On Thu, Oct 7, 2021 at 1:29 PM k.jami...@fujitsu.com
 wrote:

That said, if we're going to initially support it on postgres_fdw, which is 
simpler
than the latest patches, we need to ensure that abnormalities and errors
are properly handled


Yes. One idea for this is to include the information required to resolve
outstanding prepared transactions, in the transaction identifier that
PREPARE TRANSACTION command uses. For example, we can use the XID of
local transaction and the cluster ID of local server (e.g., cluster_name
that users specify uniquely can be used for that) as that information.
If the cluster_name of local server is "server1" and its XID is now ,
postgres_fdw issues "PREPARE TRANSACTION 'server1_'" and
"COMMIT PREPARED 'server1_'" to the foreign servers, to end those
foreign transactions in two-phase way.

If some troubles happen, the prepared transaction with "server1_"
may remain unexpectedly in one foreign server. In this case we can
determine whether to commit or rollback that outstanding transaction
by checking whether the past transaction with XID  was committed
or rollbacked in the server "server1". If it's committed, the prepared
transaction also should be committed, so we should execute
"COMMIT PREPARED 'server1_'". If it's rollbacked, the prepared
transaction also should be rollbacked. If it's in progress, we should
do nothing for that transaction.

pg_xact_status() can be used to check whether the transaction with
the specified XID was committed or rollbacked. But pg_xact_status()
can return invalid result if CLOG data for the specified XID has been
truncated by VACUUM FREEZE. To handle this case, we might need
the special table tracking the transaction status.

DBA can use the above procedure and manually resolve the outstanding
prepared transactions in foreign servers. Also probably we can implement
the function doing the procedure. If so, it might be good idea to make
background worker or cron periodically execute the function.



and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.


If it's ok with you, I'd like to work on the performance issue.  What
I have in mind is commit all remote transactions in parallel instead
of sequentially in the postgres_fdw transaction callback, as mentioned
above, but I think that would improve the performance even for
one-phase commit that we already have.


+100

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-07 Thread Etsuro Fujita

Hi,

On Thu, Oct 7, 2021 at 1:29 PM k.jami...@fujitsu.com
 wrote:
> That said, if we're going to initially support it on postgres_fdw, which is 
> simpler
> than the latest patches, we need to ensure that abnormalities and errors
> are properly handled and prove that commit performance can be improved,
> e.g. if we can commit not in serial but also possible in parallel.

If it's ok with you, I'd like to work on the performance issue.  What
I have in mind is commit all remote transactions in parallel instead
of sequentially in the postgres_fdw transaction callback, as mentioned
above, but I think that would improve the performance even for
one-phase commit that we already have.  Maybe I'm missing something,
though.

Best regards,
Etsuro Fujita

RE: Transactions involving multiple postgres foreign servers, take 2

2021-10-06 Thread k.jami...@fujitsu.com

Hi Fujii-san and Sawada-san,

Thank you very much for your replies.

> >> I noticed that this thread and its set of patches have been marked with
> "Returned with Feedback" by yourself.
> >> I find the feature (atomic commit for foreign transactions) very
> >> useful and it will pave the road for having a distributed transaction
> management in Postgres.
> >> Although we have not arrived at consensus at which approach is best,
> >> there were significant reviews and major patch changes in the past 2 years.
> >> By any chance, do you have any plans to continue this from where you left 
> >> off?
> >
> > As I could not reply to the review comments from Fujii-san for almost
> > three months, I don't have enough time to move this project forward at
> > least for now. That's why I marked this patch as RWF. I’d like to
> > continue working on this project in my spare time but I know this is
> > not a project that can be completed by using only my spare time. If
> > someone wants to work on this project, I’d appreciate it and am happy
> > to help.
> 
> Probably it's time to rethink the approach. The patch introduces foreign
> transaction manager into PostgreSQL core, but as far as I review the patch, 
> its
> changes look overkill and too complicated.
> This seems one of reasons why we could not have yet committed the feature even
> after several years.
> 
> Another concern about the approach of the patch is that it needs to change a
> backend so that it additionally waits for replication during commit phase 
> before
> executing PREPARE TRANSACTION to foreign servers. Which would decrease the
> performance during commit phase furthermore.
> 
> So I wonder if it's worth revisiting the original approach, i.e., add the 
> atomic
> commit into postgres_fdw. One disadvantage of this is that it supports atomic
> commit only between foreign PostgreSQL servers, not other various data
> resources like MySQL.
> But I'm not sure if we really want to do atomic commit between various FDWs.
> Maybe supporting only postgres_fdw is enough for most users. Thought?

The intention of Sawada-san's patch is grand although would be very much helpful
because it accommodates possible future support of atomic commit for
various types of FDWs. However, it's difficult to get the agreement altogether,
as other reviewers also point out the performance of commit. Another point is 
that
how it should work when we also implement atomic visibility (which is another
topic for distributed transactions but worth considering).
That said, if we're going to initially support it on postgres_fdw, which is 
simpler 
than the latest patches, we need to ensure that abnormalities and errors
are properly handled and prove that commit performance can be improved,
e.g. if we can commit not in serial but also possible in parallel.
And if possible, although not necessary during the first step, it may put at 
ease
the other reviewers if can we also think of the image on how to implement atomic
visibility on postgres_fdw. 
Thoughts?

Regards,
Kirk Jamison

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-05 Thread Fujii Masao





On 2021/10/05 10:38, Masahiko Sawada wrote:

Hi,

On Tue, Oct 5, 2021 at 9:56 AM k.jami...@fujitsu.com
 wrote:


Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with "Returned 
with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in 
Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?


As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.


Probably it's time to rethink the approach. The patch introduces
foreign transaction manager into PostgreSQL core, but as far as
I review the patch, its changes look overkill and too complicated.
This seems one of reasons why we could not have yet committed
the feature even after several years.

Another concern about the approach of the patch is that it needs
to change a backend so that it additionally waits for replication
during commit phase before executing PREPARE TRANSACTION
to foreign servers. Which would decrease the performance
during commit phase furthermore.

So I wonder if it's worth revisiting the original approach, i.e.,
add the atomic commit into postgres_fdw. One disadvantage of
this is that it supports atomic commit only between foreign
PostgreSQL servers, not other various data resources like MySQL.
But I'm not sure if we really want to do atomic commit between
various FDWs. Maybe supporting only postgres_fdw is enough
for most users. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-10-04 Thread Masahiko Sawada

Hi,

On Tue, Oct 5, 2021 at 9:56 AM k.jami...@fujitsu.com
 wrote:
>
> Hi Sawada-san,
>
> I noticed that this thread and its set of patches have been marked with 
> "Returned with Feedback" by yourself.
> I find the feature (atomic commit for foreign transactions) very useful
> and it will pave the road for having a distributed transaction management in 
> Postgres.
> Although we have not arrived at consensus at which approach is best,
> there were significant reviews and major patch changes in the past 2 years.
> By any chance, do you have any plans to continue this from where you left off?

As I could not reply to the review comments from Fujii-san for almost
three months, I don't have enough time to move this project forward at
least for now. That's why I marked this patch as RWF. I’d like to
continue working on this project in my spare time but I know this is
not a project that can be completed by using only my spare time. If
someone wants to work on this project, I’d appreciate it and am happy
to help.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-10-04 Thread k.jami...@fujitsu.com

Hi Sawada-san,

I noticed that this thread and its set of patches have been marked with 
"Returned with Feedback" by yourself.
I find the feature (atomic commit for foreign transactions) very useful
and it will pave the road for having a distributed transaction management in 
Postgres.
Although we have not arrived at consensus at which approach is best,
there were significant reviews and major patch changes in the past 2 years.
By any chance, do you have any plans to continue this from where you left off?

Regards,
Kirk Jamison

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-15 Thread Fujii Masao





On 2021/07/09 22:44, Masahiko Sawada wrote:

On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao  wrote:

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.


That's a valid concern. I'm really not sure what we should do here but
I guess that even if we want to support subscriptions we have another
API dedicated for subtransaction commit and rollback.

Ok, so if possible I will write POC patch for new API for foreign 
subtransactions
and consider whether it's enough simple that we can commit into core or not.


+#define FDWXACT_FLAG_PARALLEL_WORKER   0x02/* is parallel worker? */

This implies that parallel workers may execute PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED to the foreign server for atomic commit?
If so, what happens if the PREPARE TRANSACTION that one of
parallel workers issues fails? In this case, not only that parallel worker
but also other parallel workers and the leader should rollback the transaction
at all. That is, they should issue ROLLBACK PREPARED to the foreign servers.
This issue was already handled and addressed in the patches?

This seems not actual issue if only postgres_fdw is used. Because postgres_fdw
doesn't have IsForeignScanParallelSafe API. Right? But what about other FDW?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-15 Thread r.takahash...@fujitsu.com

Hi Sawada-san,


Thank you for your reply.

> BTW did you test on the local? That is, the foreign servers are
> located on the same machine?

Yes, I tested on the local since I cannot prepare the good network now.


> I guess it would be better to start a new thread for this improvement.

Thank you for your advice.
I started a new thread [1].


> What if we successfully committed 'prep_1' but an error happened
> during committing another one for some reason (i.g., corrupted 2PC
> state file, OOM etc)? We might return an error to the client but have
> already committed 'prep_1'.

Sorry, I don't have good idea now.
I imagined the command returns the list of the transaction id which ends with 
error.


[1]
https://www.postgresql.org/message-id/OS0PR01MB56828019B25CD5190AB6093282129%40OS0PR01MB5682.jpnprd01.prod.outlook.com


Regards,
Ryohei Takahashi

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-14 Thread Masahiko Sawada

On Tue, Jul 13, 2021 at 1:14 PM r.takahash...@fujitsu.com
 wrote:
>
> Hi Sawada-san,
>
>
> Thank you for your reply.
>
> > Not sure but it might be possible to keep holding an xlogreader for
> > reading PREPARE WAL records even after the transaction commit. But I
> > wonder how much open() for wal segment file accounts for the total
> > execution time of 2PC. 2PC requires 2 network round trips for each
> > participant. For example, if it took 500ms in total, we would not get
> > benefits much from the point of view of 2PC performance even if we
> > improved it from 14ms to 1ms.
>
> I made the patch based on your advice and re-run the test on the new machine.
> (The attached patch is just for test purpose.)

Thank you for testing!

>
>
> * foreign_twophase_commit = disabled
> 2686tps
>
> * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
> Ikeda-san said)
> 311tps
>
> * foreign_twophase_commit = required with attached patch (It is not necessary 
> to set -R ${RATE})
> 2057tps

Nice improvement!

BTW did you test on the local? That is, the foreign servers are
located on the same machine?

>
>
> This indicate that if we can reduce the number of times to open() wal segment 
> file during "COMMIT PREPARED", the performance can be improved.
>
> This patch can skip closing wal segment file, but I don't know when we should 
> close.
> One idea is to close when the wal segment file is recycled, but it seems 
> difficult for backend process to do so.

I guess it would be better to start a new thread for this improvement.
This idea helps not only 2PC case but also improves the
COMMIT/ROLLBACK PREPARED performance itself. Rather than thinking it
tied with this patch, I think it's good if we can discuss this patch
separately and it gets committed alone.

> BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is 
> proposed.
> I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', 
> ... 'prep_n'".
> If we can open wal segment file during bulk COMMIT PREPARED, we can not only 
> reduce the times of communication, but also reduce the times of open() wal 
> segment file.

What if we successfully committed 'prep_1' but an error happened
during committing another one for some reason (i.g., corrupted 2PC
state file, OOM etc)? We might return an error to the client but have
already committed 'prep_1'.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-13 Thread r.takahash...@fujitsu.com

Hi,


> Wouldn't it be better to explicitly initialize the pointer with NULL?

Thank you for your advice.
You are correct.

Anyway, I fixed it and re-run the performance test, it of course does not 
affect tps.

Regards,
Ryohei Takahashi

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-13 Thread Ranier Vilela

Em ter., 13 de jul. de 2021 às 01:14, r.takahash...@fujitsu.com <
r.takahash...@fujitsu.com> escreveu:

> Hi Sawada-san,
>
>
> Thank you for your reply.
>
> > Not sure but it might be possible to keep holding an xlogreader for
> > reading PREPARE WAL records even after the transaction commit. But I
> > wonder how much open() for wal segment file accounts for the total
> > execution time of 2PC. 2PC requires 2 network round trips for each
> > participant. For example, if it took 500ms in total, we would not get
> > benefits much from the point of view of 2PC performance even if we
> > improved it from 14ms to 1ms.
>
> I made the patch based on your advice and re-run the test on the new
> machine.
> (The attached patch is just for test purpose.)
>
Wouldn't it be better to explicitly initialize the pointer with NULL?
I think it's common in Postgres.

static XLogReaderState *xlogreader = NULL;


>
> * foreign_twophase_commit = disabled
> 2686tps
>
> * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as
> Ikeda-san said)
> 311tps
>
> * foreign_twophase_commit = required with attached patch (It is not
> necessary to set -R ${RATE})
> 2057tps
>
Nice results.

regards,
Ranier Vilela

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-12 Thread r.takahash...@fujitsu.com

Hi Sawada-san,


Thank you for your reply.

> Not sure but it might be possible to keep holding an xlogreader for
> reading PREPARE WAL records even after the transaction commit. But I
> wonder how much open() for wal segment file accounts for the total
> execution time of 2PC. 2PC requires 2 network round trips for each
> participant. For example, if it took 500ms in total, we would not get
> benefits much from the point of view of 2PC performance even if we
> improved it from 14ms to 1ms.

I made the patch based on your advice and re-run the test on the new machine.
(The attached patch is just for test purpose.)


* foreign_twophase_commit = disabled
2686tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
Ikeda-san said)
311tps

* foreign_twophase_commit = required with attached patch (It is not necessary 
to set -R ${RATE})
2057tps


This indicate that if we can reduce the number of times to open() wal segment 
file during "COMMIT PREPARED", the performance can be improved.

This patch can skip closing wal segment file, but I don't know when we should 
close.
One idea is to close when the wal segment file is recycled, but it seems 
difficult for backend process to do so.

BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is 
proposed.
I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 
'prep_n'".
If we can open wal segment file during bulk COMMIT PREPARED, we can not only 
reduce the times of communication, but also reduce the times of open() wal 
segment file.


Regards,
Ryohei Takahashi


Hold_xlogreader.patch
Description: Hold_xlogreader.patch

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-09 Thread Masahiko Sawada

On Fri, Jul 9, 2021 at 3:26 PM Fujii Masao  wrote:
>
>
>
> On 2021/06/30 10:05, Masahiko Sawada wrote:
> > I've attached the new version patch that incorporates the comments
> > from Fujii-san and Ikeda-san I got so far.
>
> Thanks for updating the patches!
>
> I'm now reading 0001 and 0002 patches and wondering if we can commit them
> at first because they just provide independent basic mechanism for
> foreign transaction management.
>
> One question regarding them is; Why did we add new API only for "top" foreign
> transaction? Even with those patches, old API (CallSubXactCallbacks) is still
> being used for foreign subtransaction and xact_depth is still being managed
> in postgres_fdw layer (not PostgreSQL core). Is this intentional?

Yes, it's not needed for 2PC support and I was also concerned to add
complexity to the core by adding new API for subscriptions that are
not necessarily necessary for 2PC.

> As far as I read the code, keep using old API for foreign subtransaction 
> doesn't
> cause any actual bug. But it's just strange and half-baked to manage top and
> sub transaction in the differenet layer and to use old and new API for them.

That's a valid concern. I'm really not sure what we should do here but
I guess that even if we want to support subscriptions we have another
API dedicated for subtransaction commit and rollback.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-09 Thread Masahiko Sawada

Sorry for the late reply.

On Tue, Jul 6, 2021 at 3:15 PM r.takahash...@fujitsu.com
 wrote:
>
> Hi,
>
>
> I'm interested in this patch and I also run the same test with Ikeda-san's 
> fxact_update.pgbench.

Thank you for testing!

> In my environment (poor spec VM), the result is following.
>
> * foreign_twophase_commit = disabled
> 363tps
>
> * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
> Ikeda-san said)
> 13tps
>
>
> I analyzed the bottleneck using pstack and strace.
> I noticed that the open() during "COMMIT PREPARED" command is very slow.
>
> In my environment the latency of the "COMMIT PREPARED" is 16ms.
> (On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms)
> In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms.
> Therefore, open() is the bottleneck of "COMMIT PREPARED".
> Furthermore, I noticed that the backend process almost always open the same 
> wal segment file.
>
> In the current patch, the backend process on foreign server which is 
> associated with the connection from the resolver process always run "COMMIT 
> PREPARED" command.
> Therefore, the wal segment file of the current "COMMIT PREPARED" command 
> probably be the same with the previous "COMMIT PREPARED" command.
>
> In order to improve the performance of the resolver process, I think it is 
> useful to skip closing wal segment file during the "COMMIT PREPARED" and 
> reuse file descriptor.
> Is it possible?

Not sure but it might be possible to keep holding an xlogreader for
reading PREPARE WAL records even after the transaction commit. But I
wonder how much open() for wal segment file accounts for the total
execution time of 2PC. 2PC requires 2 network round trips for each
participant. For example, if it took 500ms in total, we would not get
benefits much from the point of view of 2PC performance even if we
improved it from 14ms to 1ms.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-09 Thread Masahiko Sawada

Sorry for the late reply.

On Mon, Jul 5, 2021 at 3:29 PM Masahiro Ikeda  wrote:
>
>
>
> On 2021/06/30 10:05, Masahiko Sawada wrote:
> > On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda  
> > wrote:
> >>
> >> Hi Jamison-san, sawada-san,
> >>
> >> Thanks for testing!
> >>
> >> FWIF, I tested using pgbench with "--rate=" option to know the server
> >> can execute transactions with stable throughput. As sawada-san said,
> >> the latest patch resolved second phase of 2PC asynchronously. So,
> >> it's difficult to control the stable throughput without "--rate=" option.
> >>
> >> I also worried what I should do when the error happened because to increase
> >> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
> >> show the error, is it better to add the case to the HINT message?
> >>
> >> BTW, if sawada-san already develop to run the resolver processes in 
> >> parallel,
> >> why don't you measure performance improvement? Although Robert-san,
> >> Tunakawa-san and so on are discussing what architecture is best, one
> >> discussion point is that there is a performance risk if adopting 
> >> asynchronous
> >> approach. If we have promising solutions, I think we can make the 
> >> discussion
> >> forward.
> >
> > Yeah, if we can asynchronously resolve the distributed transactions
> > without worrying about max_prepared_foreign_transaction error, it
> > would be good. But we will need synchronous resolution at some point.
> > I think we at least need to discuss it at this point.
> >
> > I've attached the new version patch that incorporates the comments
> > from Fujii-san and Ikeda-san I got so far. We launch a resolver
> > process per foreign server, committing prepared foreign transactions
> > on foreign servers in parallel. To get a better performance based on
> > the current architecture, we can have multiple resolver processes per
> > foreign server but it seems not easy to tune it in practice. Perhaps
> > is it better if we simply have a pool of resolver processes and we
> > assign a resolver process to the resolution of one distributed
> > transaction one by one? That way, we need to launch resolver processes
> > as many as the concurrent backends using 2PC.
>
> Thanks for updating the patches.
>
> I have tested in my local laptop and summary is the following.

Thank you for testing!

>
> (1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.
>
> Although I expected it improves by 2.0 times because the workload is that one
> transaction access two remote servers... I think the reason is that the disk
> is bottleneck and I couldn't prepare disks for each postgresql servers. If I
> could, I think the performance can be improved by 2.0 times.
>
>
> (2) The latest patch(v37) throughput of foreign_twophase_commit = required is
> about 36% compared to the case if foreign_twophase_commit = disabled.
>
> Although the throughput is improved, the absolute performance is not good. It
> may be the fate of 2PC. I think the reason is that the number of WAL writes is
> much increase and, the disk writes in my laptop is the bottleneck. I want to
> know the result testing in richer environments if someone can do so.
>
>
> (3) The latest patch(v37) has no overhead if foreign_twophase_commit =
> disabled. On the contrary, the performance improved by 3%. It may be within
> the margin of error.
>
>
>
> The test detail is following.
>
> # condition
>
> * 1 coordinator and 3 foreign servers
>
> * 4 instance shared one ssd disk.
>
> * one transaction queries different two foreign servers.
>
> ``` fxact_update.pgbench
> \set id random(1, 100)
>
> \set partnum 3
> \set p1 random(1, :partnum)
> \set p2 ((:p1 + 1) % :partnum) + 1
>
> BEGIN;
> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> COMMIT;
> ```
>
> * pgbench generates load. I increased ${RATE} little by little until "maximum
> number of foreign transactions reached" error happens.
>
> ```
> pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
> ```
>
> * parameters
> max_prepared_transactions = 100
> max_prepared_foreign_transactions = 200
> max_foreign_transaction_resolvers = 4
>
>
> # test source code patterns
>
> 1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
> 2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
> 3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
> 4. 2595e039 without 2pc patches(v37).
>
>
> # results
>
> 1. tps = 241.8000TPS
>latency average = 10.413ms
>
> 2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
>latency average = 15.427ms
>
> 3. tps = 987.372220 ( by 1.03% compared to 4. )
>latency average = 8.102ms
>
> 4. tps = 955.984574
>latency average = 8.368ms
>
> The disk is the bottleneck in my environment because disk util is almost 100%
> in every pattern. If disks for each instance

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-09 Thread Fujii Masao





On 2021/06/30 10:05, Masahiko Sawada wrote:

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far.


Thanks for updating the patches!

I'm now reading 0001 and 0002 patches and wondering if we can commit them
at first because they just provide independent basic mechanism for
foreign transaction management.

One question regarding them is; Why did we add new API only for "top" foreign
transaction? Even with those patches, old API (CallSubXactCallbacks) is still
being used for foreign subtransaction and xact_depth is still being managed
in postgres_fdw layer (not PostgreSQL core). Is this intentional?
Sorry if this was already discussed before.

As far as I read the code, keep using old API for foreign subtransaction doesn't
cause any actual bug. But it's just strange and half-baked to manage top and
sub transaction in the differenet layer and to use old and new API for them.

OTOH, I'm afraid that adding new (not-essential) API for foreign subtransaction
might increase the code complexity unnecessarily.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-07 Thread k.jami...@fujitsu.com

On Wed, June 30, 2021 10:06 (GMT+9), Masahiko Sawada wrote:
> I've attached the new version patch that incorporates the comments from
> Fujii-san and Ikeda-san I got so far. We launch a resolver process per foreign
> server, committing prepared foreign transactions on foreign servers in 
> parallel.

Hi Sawada-san,
Thank you for the latest set of patches.
I've noticed from cfbot that the regression test failed, and I also could not 
compile it.

== running regression test queries==
test test_fdwxact ... FAILED   21 ms
== shutting down postmaster   ==
==
 1 of 1 tests failed. 
==

> To get a better performance based on the current architecture, we can have
> multiple resolver processes per foreign server but it seems not easy to tune 
> it
> in practice. Perhaps is it better if we simply have a pool of resolver 
> processes
> and we assign a resolver process to the resolution of one distributed
> transaction one by one? That way, we need to launch resolver processes as
> many as the concurrent backends using 2PC.

Yes, finding the right value to tune of of max_foreign_prepared_transactions and
max_prepared_transactions seem difficult. If we set the number of resolver
process to number of concurrent backends using 2PC, how do we determine
the value of max_foreign_transaction_resolvers? It might be good to set some
statistics to judge the value, then we can compare the performance from the V37
version.

-
Also, this is a bit of side topic, and I know we've been discussing how to 
improve/fix the resolver process bottlenecks, and Takahashi-san provided
the details above thread where V37 has problems. (I am joining the testing too.)

I am not sure if this has been brought up before because of the years of
thread. But I think that there is a need to consider the need to prevent for the
resolver process from an infinite wait loop of resolving a prepared foreign
transaction. Currently, when a crashed foreign server is recovered during
resolution retries, the information is recovered from WAL and files,
and the resolver process resumes the foreign transaction resolution.
However, what if we cannot (or intentionally do not want to) recover the
crashed server after a long time?

An idea is to make the resolver process to automatically stop after some
maximum number of retries.
We can call the parameter as foreign_transaction_resolution_max_retry_count.
There may be a better name, but I followed the pattern from your patch.

The server downtime can be estimated considering the proposed parameter
foreign_transaction_resolution_retry_interval (default 10s) from the
patch set.
In addition, according to docs, "a foreign server using the postgres_fdw
foreign data wrapper can have the same options that libpq accepts in
connection strings", so the connect_timeout set during CREATE SERVER can
also affect it.

Example:
CREATE SERVER's connect_timeout setting = 5s
foreign_transaction_resolution_retry_interval = 10s
foreign_transaction_resolution_max_retry_count = 3

Estimated total time before resolver stops: 
= (5s) * (3 + 1) + (10s) * (3) = 50 s

00s: 1st connect start
05s: 1st connect timeout
(retry interval)
15s: 2nd connect start (1st retry)
20s: 2nd connect timeout
(retry interval)
30s: 3rd connect start (2nd retry)
35s: 3rd connect timeout
(retry interval)
45s: 4th connect start (3rd retry)
50s: 4th connect timeout
(resolver process stops)

Then the resolver process will not wait indefinitely and will stop after
some time depending on the setting of the above parameters.
This could be the automatic implementation of pg_stop_foreign_xact_resolver.
Assuming that resolver is stopped, then the crashed server is
decided to be restored, the user can then execute pg_resolve_foreign_xact().
Do you think the idea is feasible and we can add it as part of the patch sets?

Regards,
Kirk Jamison

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-06 Thread r.takahash...@fujitsu.com

Hi,


I'm interested in this patch and I also run the same test with Ikeda-san's 
fxact_update.pgbench.
In my environment (poor spec VM), the result is following.

* foreign_twophase_commit = disabled
363tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
Ikeda-san said)
13tps


I analyzed the bottleneck using pstack and strace.
I noticed that the open() during "COMMIT PREPARED" command is very slow.

In my environment the latency of the "COMMIT PREPARED" is 16ms.
(On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms)
In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms.
Therefore, open() is the bottleneck of "COMMIT PREPARED".
Furthermore, I noticed that the backend process almost always open the same wal 
segment file.

In the current patch, the backend process on foreign server which is associated 
with the connection from the resolver process always run "COMMIT PREPARED" 
command.
Therefore, the wal segment file of the current "COMMIT PREPARED" command 
probably be the same with the previous "COMMIT PREPARED" command.

In order to improve the performance of the resolver process, I think it is 
useful to skip closing wal segment file during the "COMMIT PREPARED" and reuse 
file descriptor.
Is it possible?


Regards,
Ryohei Takahashi

Re: Transactions involving multiple postgres foreign servers, take 2

2021-07-05 Thread Masahiro Ikeda

On 2021/06/30 10:05, Masahiko Sawada wrote:
> On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda  
> wrote:
>>
>> Hi Jamison-san, sawada-san,
>>
>> Thanks for testing!
>>
>> FWIF, I tested using pgbench with "--rate=" option to know the server
>> can execute transactions with stable throughput. As sawada-san said,
>> the latest patch resolved second phase of 2PC asynchronously. So,
>> it's difficult to control the stable throughput without "--rate=" option.
>>
>> I also worried what I should do when the error happened because to increase
>> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
>> show the error, is it better to add the case to the HINT message?
>>
>> BTW, if sawada-san already develop to run the resolver processes in parallel,
>> why don't you measure performance improvement? Although Robert-san,
>> Tunakawa-san and so on are discussing what architecture is best, one
>> discussion point is that there is a performance risk if adopting asynchronous
>> approach. If we have promising solutions, I think we can make the discussion
>> forward.
> 
> Yeah, if we can asynchronously resolve the distributed transactions
> without worrying about max_prepared_foreign_transaction error, it
> would be good. But we will need synchronous resolution at some point.
> I think we at least need to discuss it at this point.
> 
> I've attached the new version patch that incorporates the comments
> from Fujii-san and Ikeda-san I got so far. We launch a resolver
> process per foreign server, committing prepared foreign transactions
> on foreign servers in parallel. To get a better performance based on
> the current architecture, we can have multiple resolver processes per
> foreign server but it seems not easy to tune it in practice. Perhaps
> is it better if we simply have a pool of resolver processes and we
> assign a resolver process to the resolution of one distributed
> transaction one by one? That way, we need to launch resolver processes
> as many as the concurrent backends using 2PC.

Thanks for updating the patches.

I have tested in my local laptop and summary is the following.

(1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.

Although I expected it improves by 2.0 times because the workload is that one
transaction access two remote servers... I think the reason is that the disk
is bottleneck and I couldn't prepare disks for each postgresql servers. If I
could, I think the performance can be improved by 2.0 times.

(2) The latest patch(v37) throughput of foreign_twophase_commit = required is
about 36% compared to the case if foreign_twophase_commit = disabled.

Although the throughput is improved, the absolute performance is not good. It
may be the fate of 2PC. I think the reason is that the number of WAL writes is
much increase and, the disk writes in my laptop is the bottleneck. I want to
know the result testing in richer environments if someone can do so.

(3) The latest patch(v37) has no overhead if foreign_twophase_commit =
disabled. On the contrary, the performance improved by 3%. It may be within
the margin of error.

The test detail is following.

# condition

* 1 coordinator and 3 foreign servers

* 4 instance shared one ssd disk.

* one transaction queries different two foreign servers.

``` fxact_update.pgbench
\set id random(1, 100)

\set partnum 3
\set p1 random(1, :partnum)
\set p2 ((:p1 + 1) % :partnum) + 1

BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

* pgbench generates load. I increased ${RATE} little by little until "maximum
number of foreign transactions reached" error happens.

```
pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
```

* parameters
max_prepared_transactions = 100
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 4

# test source code patterns

1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
4. 2595e039 without 2pc patches(v37).

# results

1. tps = 241.8000TPS
   latency average = 10.413ms

2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
   latency average = 15.427ms

3. tps = 987.372220 ( by 1.03% compared to 4. )
   latency average = 8.102ms

4. tps = 955.984574
   latency average = 8.368ms

The disk is the bottleneck in my environment because disk util is almost 100%
in every pattern. If disks for each instance can be prepared, I think we can
expect more performance improvements.

>> In my understanding, there are three improvement idea. First is that to make
>> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
>> PREPARED" remote servers in bulk. Third is to stop syncing the WAL
>> remove_fdwxact() after resolving is done,

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread Masahiro Ikeda




On 2021/06/24 22:11, Masahiko Sawada wrote:
> On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao  
> wrote:
>> On 2021/05/11 13:37, Masahiko Sawada wrote:
>> So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
>> XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
>> for WAL record to be replicated to the standby if sync replication is 
>> enabled?
>> Otherwise, when the failover happens, new primary (past-standby)
>> might not have enough XLOG_FDWXACT_INSERT WAL records and
>> might fail to find some in-doubt foreign transactions.
> 
> But even if we wait for the record to be replicated, this problem
> isn't completely resolved, right? If the server crashes before the
> standy receives the record and the failover happens then the new
> master doesn't have the record. I wonder if we need to have another
> FDW API in order to get the list of prepared transactions from the
> foreign server (FDW). For example in postgres_fdw case, it gets the
> list of prepared transactions on the foreign server by executing a
> query. It seems to me that this corresponds to xa_recover in the XA
> specification.

FWIF, Citus implemented as sawada-san said above [1].

Since each WAL record for PREPARE is flushing in the latest patch, the latency
became too much, especially under synchronous replication. For example, the
transaction involving three foreign servers must wait to sync "three" WAL
records for PREPARE and "one" WAL records for local commit in remote server
one by one sequentially. So, I think that Sawada-san's idea is good to improve
the latency although fdw developer's work increases.

[1]
SIGMOD 2021 525 Citus: Distributed PostgreSQL for Data Intensive Applications
>From 12:27 says that how to solve unresolved prepared xacts.
https://www.youtube.com/watch?v=AlF4C60FdlQ=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread Masahiro Ikeda




On 2021/06/24 22:27, Masahiko Sawada wrote:
> On Thu, Jun 24, 2021 at 9:46 PM k.jami...@fujitsu.com
>  wrote:
>>
>> Hi Sawada-san,
>>
>> I also tried to play a bit with the latest patches similar to Ikeda-san,
>> and with foreign 2PC parameter enabled/required.
> 
> Thank you for testing the patch!
> 
>>
>> b. about performance bottleneck (just share my simple benchmark
>> results)
>>
>> The resolver process can be performance bottleneck easily although
>> I think some users want this feature even if the performance is not so
>>> good.
>>
>> I tested with very simple workload in my laptop.
>>
>> The test condition is
>> * two remote foreign partitions and one transaction inserts an
>> entry in each partitions.
>> * local connection only. If NW latency became higher, the
>> performance became worse.
>> * pgbench with 8 clients.
>>
>> The test results is the following. The performance of 2PC is only
>> 10% performance of the one of without 2PC.
>>
>> * with foreign_twophase_commit = requried
>> -> If load with more than 10TPS, the number of unresolved foreign
>> -> transactions
>> is increasing and stop with the warning "Increase
>> max_prepared_foreign_transactions".
>
> What was the value of max_prepared_foreign_transactions?

 Now, I tested with 200.

 If each resolution is finished very soon, I thought it's enough
 because 8clients x 2partitions = 16, though... But, it's difficult how
 to know the stable values.
>>>
>>> During resolving one distributed transaction, the resolver needs both one
>>> round trip and fsync-ing WAL record for each foreign transaction.
>>> Since the client doesn’t wait for the distributed transaction to be 
>>> resolved,
>>> the resolver process can be easily bottle-neck given there are 8 clients.
>>>
>>> If foreign transaction resolution was resolved synchronously, 16 would
>>> suffice.
>>
>>
>> I tested the V36 patches on my 16-core machines.
>> I setup two foreign servers (F1, F2) .
>> F1 has addressbook table.
>> F2 has pgbench tables (scale factor = 1).
>> There is also 1 coordinator (coor) server where I created user mapping to 
>> access the foreign servers.
>> I executed the benchmark measurement on coordinator.
>> My custom scripts are setup in a way that queries from coordinator
>> would have to access the two foreign servers.
>>
>> Coordinator:
>> max_prepared_foreign_transactions = 200
>> max_foreign_transaction_resolvers = 1
>> foreign_twophase_commit = required
>>
>> Other external servers 1 & 2 (F1 & F2):
>> max_prepared_transactions = 100
>>
>>
>> [select.sql]
>> \set int random(1, 10)
>> BEGIN;
>> SELECT ad.name, ad.age, ac.abalance
>> FROM addressbook ad, pgbench_accounts ac
>> WHERE ad.id = :int AND ad.id = ac.aid;
>> COMMIT;
>>
>> I then executed:
>> pgbench -r -c 2 -j 2 -T 60 -f select.sql coor
>>
>> While there were no problems with 1-2 clients, I started having problems
>> when running the benchmark with more than 3 clients.
>>
>> pgbench -r -c 4 -j 4 -T 60 -f select.sql coor
>>
>> I got the following error on coordinator:
>>
>> [95396] ERROR:  could not prepare transaction on server F2 with ID 
>> fx_151455979_1216200_16422
>> [95396] STATEMENT:  COMMIT;
>> WARNING:  there is no transaction in progress
>> pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR:  
>> could not prepare transaction on server F2 with ID fx_151455979_1216200_16422
>>
>> Here's the log on foreign server 2  matching the above error:
>>  LOG:  statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
>>  ERROR:  maximum number of prepared transactions reached
>>  HINT:  Increase max_prepared_transactions (currently 100).
>>  STATEMENT:  PREPARE TRANSACTION 'fx_151455979_1216200_16422'
>>
>> So I increased the max_prepared_transactions of  and  from 100 to 
>> 200.
>> Then I got the error:
>>
>> [146926] ERROR:  maximum number of foreign transactions reached
>> [146926] HINT:  Increase max_prepared_foreign_transactions: "200".
>>
>> So I increased the max_prepared_foreign_transactions to "300",
>> and got the same error of need to increase the max_prepared_transactions of 
>> foreign servers.
>>
>> I just can't find the right tuning values for this.
>> It seems that we always run out of memory in FdwXactState insert_fdwxact
>> with multiple concurrent connections during PREPARE TRANSACTION.
>> This one I only encountered for SELECT benchmark.
>> Although I've got no problems with multiple connections for my custom 
>> scripts for
>> UPDATE and INSERT benchmarks when I tested up to 30 clients.
>>
>> Would the following possibly solve this bottleneck problem?
> 
> With the following idea, the performance will get better but will not
> be completely solved. Because those results shared by you and
> Ikeda-san come from the fact that with the patch we asynchronously
> commit the foreign prepared transaction (i.g., asynchronously
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread Masahiko Sawada

On Thu, Jun 24, 2021 at 10:11 PM Masahiko Sawada  wrote:
>
> On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao  
> wrote:
> >
> >
> >
> > (5)
> > +2. Pre-Commit phase (1st phase of two-phase commit)
> > +we record the corresponding WAL indicating that the foreign server is 
> > involved
> > +with the current transaction before doing PREPARE all foreign transactions.
> > +Thus, in case we loose connectivity to the foreign server or crash 
> > ourselves,
> > +we will remember that we might have prepared tranascation on the foreign
> > +server, and try to resolve it when connectivity is restored or after crash
> > +recovery.
> >
> > So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
> > XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
> > for WAL record to be replicated to the standby if sync replication is 
> > enabled?
> > Otherwise, when the failover happens, new primary (past-standby)
> > might not have enough XLOG_FDWXACT_INSERT WAL records and
> > might fail to find some in-doubt foreign transactions.
>
> But even if we wait for the record to be replicated, this problem
> isn't completely resolved, right?

Ah, I misunderstood the order of writing WAL records and preparing
foreign transactions. You're right. Combining your suggestion below,
perhaps we need to write all WAL records, call XLogFlush(), wait for
those records to be replicated, and prepare all foreign transactions.
Even in cases where the server crashes during preparing a foreign
transaction and the failover happens, the new master has all foreign
transaction entries. Some of them might not actually be prepared on
the foreign servers but it should not be a problem.

> > (6)
> > XLogFlush() is called for each foreign transaction. So if there are many
> > foreign transactions, XLogFlush() is called too frequently. Which might
> > cause unnecessary performance overhead? Instead, for example,
> > we should call XLogFlush() only at once in 
> > FdwXactPrepareForeignTransactions()
> > after inserting all WAL records for all foreign transactions?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread Masahiko Sawada

On Thu, Jun 24, 2021 at 9:46 PM k.jami...@fujitsu.com
 wrote:
>
> Hi Sawada-san,
>
> I also tried to play a bit with the latest patches similar to Ikeda-san,
> and with foreign 2PC parameter enabled/required.

Thank you for testing the patch!

>
> > > >> b. about performance bottleneck (just share my simple benchmark
> > > >> results)
> > > >>
> > > >> The resolver process can be performance bottleneck easily although
> > > >> I think some users want this feature even if the performance is not so
> > good.
> > > >>
> > > >> I tested with very simple workload in my laptop.
> > > >>
> > > >> The test condition is
> > > >> * two remote foreign partitions and one transaction inserts an
> > > >> entry in each partitions.
> > > >> * local connection only. If NW latency became higher, the
> > > >> performance became worse.
> > > >> * pgbench with 8 clients.
> > > >>
> > > >> The test results is the following. The performance of 2PC is only
> > > >> 10% performance of the one of without 2PC.
> > > >>
> > > >> * with foreign_twophase_commit = requried
> > > >> -> If load with more than 10TPS, the number of unresolved foreign
> > > >> -> transactions
> > > >> is increasing and stop with the warning "Increase
> > > >> max_prepared_foreign_transactions".
> > > >
> > > > What was the value of max_prepared_foreign_transactions?
> > >
> > > Now, I tested with 200.
> > >
> > > If each resolution is finished very soon, I thought it's enough
> > > because 8clients x 2partitions = 16, though... But, it's difficult how
> > > to know the stable values.
> >
> > During resolving one distributed transaction, the resolver needs both one
> > round trip and fsync-ing WAL record for each foreign transaction.
> > Since the client doesn’t wait for the distributed transaction to be 
> > resolved,
> > the resolver process can be easily bottle-neck given there are 8 clients.
> >
> > If foreign transaction resolution was resolved synchronously, 16 would
> > suffice.
>
>
> I tested the V36 patches on my 16-core machines.
> I setup two foreign servers (F1, F2) .
> F1 has addressbook table.
> F2 has pgbench tables (scale factor = 1).
> There is also 1 coordinator (coor) server where I created user mapping to 
> access the foreign servers.
> I executed the benchmark measurement on coordinator.
> My custom scripts are setup in a way that queries from coordinator
> would have to access the two foreign servers.
>
> Coordinator:
> max_prepared_foreign_transactions = 200
> max_foreign_transaction_resolvers = 1
> foreign_twophase_commit = required
>
> Other external servers 1 & 2 (F1 & F2):
> max_prepared_transactions = 100
>
>
> [select.sql]
> \set int random(1, 10)
> BEGIN;
> SELECT ad.name, ad.age, ac.abalance
> FROM addressbook ad, pgbench_accounts ac
> WHERE ad.id = :int AND ad.id = ac.aid;
> COMMIT;
>
> I then executed:
> pgbench -r -c 2 -j 2 -T 60 -f select.sql coor
>
> While there were no problems with 1-2 clients, I started having problems
> when running the benchmark with more than 3 clients.
>
> pgbench -r -c 4 -j 4 -T 60 -f select.sql coor
>
> I got the following error on coordinator:
>
> [95396] ERROR:  could not prepare transaction on server F2 with ID 
> fx_151455979_1216200_16422
> [95396] STATEMENT:  COMMIT;
> WARNING:  there is no transaction in progress
> pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR:  could 
> not prepare transaction on server F2 with ID fx_151455979_1216200_16422
>
> Here's the log on foreign server 2  matching the above error:
>  LOG:  statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
>  ERROR:  maximum number of prepared transactions reached
>  HINT:  Increase max_prepared_transactions (currently 100).
>  STATEMENT:  PREPARE TRANSACTION 'fx_151455979_1216200_16422'
>
> So I increased the max_prepared_transactions of  and  from 100 to 200.
> Then I got the error:
>
> [146926] ERROR:  maximum number of foreign transactions reached
> [146926] HINT:  Increase max_prepared_foreign_transactions: "200".
>
> So I increased the max_prepared_foreign_transactions to "300",
> and got the same error of need to increase the max_prepared_transactions of 
> foreign servers.
>
> I just can't find the right tuning values for this.
> It seems that we always run out of memory in FdwXactState insert_fdwxact
> with multiple concurrent connections during PREPARE TRANSACTION.
> This one I only encountered for SELECT benchmark.
> Although I've got no problems with multiple connections for my custom scripts 
> for
> UPDATE and INSERT benchmarks when I tested up to 30 clients.
>
> Would the following possibly solve this bottleneck problem?

With the following idea, the performance will get better but will not
be completely solved. Because those results shared by you and
Ikeda-san come from the fact that with the patch we asynchronously
commit the foreign prepared transaction (i.g., asynchronously
performing the second phase of 2PC), but not the architecture. As I
mentioned before, I intentionally

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread Masahiko Sawada

On Sat, Jun 12, 2021 at 1:25 AM Fujii Masao  wrote:
>
>
>
> On 2021/05/11 13:37, Masahiko Sawada wrote:
> > I've attached the updated patches that incorporated comments from
> > Zhihong and Ikeda-san.
>
> Thanks for updating the patches!
>
> I'm still reading these patches, but I'd like to share some review comments
> that I found so far.

Thank you for the comments!

>
> (1)
> +/* Remove the foreign transaction from FdwXactParticipants */
> +void
> +FdwXactUnregisterXact(UserMapping *usermapping)
> +{
> +   Assert(IsTransactionState());
> +   RemoveFdwXactEntry(usermapping->umid);
> +}
>
> Currently there is no user of FdwXactUnregisterXact().
> This function should be removed?

I think that this function can be used by  other FDW implementations
to unregister foreign transaction entry, although there is no use case
in postgres_fdw. This function corresponds to xa_unreg in the XA
specification.

>
>
> (2)
> When I ran the regression test, I got the following failure.
>
> = Contents of ./src/test/modules/test_fdwxact/regression.diffs
> diff -U3 
> /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
>  
> /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
> --- 
> /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
>  2021-06-10 02:19:43.808622747 +
> +++ 
> /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
>   2021-06-10 02:29:53.452410462 +
> @@ -174,7 +174,7 @@
>   SELECT count(*) FROM pg_foreign_xacts;
>count
>   ---
> - 1
> + 4
>   (1 row)

WIll fix.

>
>
> (3)
> +errmsg("could not read foreign transaction 
> state from xlog at %X/%X",
> +   (uint32) (lsn >> 32),
> +   (uint32) lsn)));
>
> LSN_FORMAT_ARGS() should be used?

Agreed.

>
>
> (4)
> +extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
> +   int len);
>
> Since RecreateFdwXactFile() is used only in fdwxact.c,
> the above "extern" is not necessary?

Right.

>
>
> (5)
> +2. Pre-Commit phase (1st phase of two-phase commit)
> +we record the corresponding WAL indicating that the foreign server is 
> involved
> +with the current transaction before doing PREPARE all foreign transactions.
> +Thus, in case we loose connectivity to the foreign server or crash ourselves,
> +we will remember that we might have prepared tranascation on the foreign
> +server, and try to resolve it when connectivity is restored or after crash
> +recovery.
>
> So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
> XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
> for WAL record to be replicated to the standby if sync replication is enabled?
> Otherwise, when the failover happens, new primary (past-standby)
> might not have enough XLOG_FDWXACT_INSERT WAL records and
> might fail to find some in-doubt foreign transactions.

But even if we wait for the record to be replicated, this problem
isn't completely resolved, right? If the server crashes before the
standy receives the record and the failover happens then the new
master doesn't have the record. I wonder if we need to have another
FDW API in order to get the list of prepared transactions from the
foreign server (FDW). For example in postgres_fdw case, it gets the
list of prepared transactions on the foreign server by executing a
query. It seems to me that this corresponds to xa_recover in the XA
specification.

> (6)
> XLogFlush() is called for each foreign transaction. So if there are many
> foreign transactions, XLogFlush() is called too frequently. Which might
> cause unnecessary performance overhead? Instead, for example,
> we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
> after inserting all WAL records for all foreign transactions?

Agreed.

>
>
> (7)
> /* Open connection; report that we'll create a prepared statement. */
> fmstate->conn = GetConnection(user, true, >conn_state);
> +   MarkConnectionModified(user);
>
> MarkConnectionModified() should be called also when TRUNCATE on
> a foreign table is executed?

Good catch. Will fix.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-24 Thread k.jami...@fujitsu.com

Hi Sawada-san,

I also tried to play a bit with the latest patches similar to Ikeda-san,
and with foreign 2PC parameter enabled/required.

> > >> b. about performance bottleneck (just share my simple benchmark
> > >> results)
> > >>
> > >> The resolver process can be performance bottleneck easily although
> > >> I think some users want this feature even if the performance is not so
> good.
> > >>
> > >> I tested with very simple workload in my laptop.
> > >>
> > >> The test condition is
> > >> * two remote foreign partitions and one transaction inserts an
> > >> entry in each partitions.
> > >> * local connection only. If NW latency became higher, the
> > >> performance became worse.
> > >> * pgbench with 8 clients.
> > >>
> > >> The test results is the following. The performance of 2PC is only
> > >> 10% performance of the one of without 2PC.
> > >>
> > >> * with foreign_twophase_commit = requried
> > >> -> If load with more than 10TPS, the number of unresolved foreign
> > >> -> transactions
> > >> is increasing and stop with the warning "Increase
> > >> max_prepared_foreign_transactions".
> > >
> > > What was the value of max_prepared_foreign_transactions?
> >
> > Now, I tested with 200.
> >
> > If each resolution is finished very soon, I thought it's enough
> > because 8clients x 2partitions = 16, though... But, it's difficult how
> > to know the stable values.
> 
> During resolving one distributed transaction, the resolver needs both one
> round trip and fsync-ing WAL record for each foreign transaction.
> Since the client doesn’t wait for the distributed transaction to be resolved,
> the resolver process can be easily bottle-neck given there are 8 clients.
> 
> If foreign transaction resolution was resolved synchronously, 16 would
> suffice.


I tested the V36 patches on my 16-core machines.
I setup two foreign servers (F1, F2) .
F1 has addressbook table.
F2 has pgbench tables (scale factor = 1).
There is also 1 coordinator (coor) server where I created user mapping to 
access the foreign servers.
I executed the benchmark measurement on coordinator.
My custom scripts are setup in a way that queries from coordinator
would have to access the two foreign servers.

Coordinator:
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 1
foreign_twophase_commit = required

Other external servers 1 & 2 (F1 & F2):
max_prepared_transactions = 100


[select.sql]
\set int random(1, 10)
BEGIN;
SELECT ad.name, ad.age, ac.abalance
FROM addressbook ad, pgbench_accounts ac
WHERE ad.id = :int AND ad.id = ac.aid;
COMMIT;

I then executed:
pgbench -r -c 2 -j 2 -T 60 -f select.sql coor

While there were no problems with 1-2 clients, I started having problems
when running the benchmark with more than 3 clients.

pgbench -r -c 4 -j 4 -T 60 -f select.sql coor

I got the following error on coordinator:

[95396] ERROR:  could not prepare transaction on server F2 with ID 
fx_151455979_1216200_16422
[95396] STATEMENT:  COMMIT;
WARNING:  there is no transaction in progress
pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR:  could 
not prepare transaction on server F2 with ID fx_151455979_1216200_16422

Here's the log on foreign server 2  matching the above error:
 LOG:  statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422'
 ERROR:  maximum number of prepared transactions reached
 HINT:  Increase max_prepared_transactions (currently 100).
 STATEMENT:  PREPARE TRANSACTION 'fx_151455979_1216200_16422'

So I increased the max_prepared_transactions of  and  from 100 to 200.
Then I got the error:

[146926] ERROR:  maximum number of foreign transactions reached
[146926] HINT:  Increase max_prepared_foreign_transactions: "200".

So I increased the max_prepared_foreign_transactions to "300",
and got the same error of need to increase the max_prepared_transactions of 
foreign servers.

I just can't find the right tuning values for this.
It seems that we always run out of memory in FdwXactState insert_fdwxact 
with multiple concurrent connections during PREPARE TRANSACTION.
This one I only encountered for SELECT benchmark. 
Although I've got no problems with multiple connections for my custom scripts 
for
UPDATE and INSERT benchmarks when I tested up to 30 clients.

Would the following possibly solve this bottleneck problem?

> > > To speed up the foreign transaction resolution, some ideas have been
> > > discussed. As another idea, how about launching resolvers for each
> > > foreign server? That way, we resolve foreign transactions on each
> > > foreign server in parallel. If foreign transactions are concentrated
> > > on the particular server, we can have multiple resolvers for the one
> > > foreign server. It doesn’t change the fact that all foreign
> > > transaction resolutions are processed by resolver processes.
> >
> > Awesome! There seems to be another pros that even if a foreign server
> > is temporarily busy or stopped due to fail over, other foreign
> > server's transactions can

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-17 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.ta...@fujitsu.com
>  wrote:
> > Postgres can do that, but other implementations can not necessaily do it, 
> > I'm
> afraid.  But before that, the FDW interface documentation doesn't describe
> anything about how to handle interrupts.  Actually, odbc_fdw and possibly
> other FDWs don't respond to interrupts.
> 
> Well, I'd consider that a bug.

I kind of hesitate to call it a bug...  Unlike libpq, JDBC (for jdbc_fdw) 
doesn't have asynchronous interface, and Oracle and PostgreSQL ODBC drivers 
don't support asynchronous interface.  Even with libpq, COMMIT (and other SQL 
commands) is not always cancellable, e.g., when the (NFS) storage server gets 
hand while writing WAL.

> > What we're talking here is mainly whether commit should return success or
> failure when some participants failed to commit in the second phase of 2PC.
> That's new to Postgres, isn't it?  Anyway, we should respect existing relevant
> specifications and (well-known) implementations before we conclude that we
> have to devise our own behavior.
> 
> Sure ... but we can only decide to do things that the implementation
> can support, and running code that might fail after we've committed
> locally isn't one of them.

Yes, I understand that Postgres may not be able to conform to specifications or 
well-known implementations in all aspects.  I'm just suggesting to take the 
stance "We carefully considered established industry specifications that we can 
base on, did our best to design the desirable behavior learned from them, but 
couldn't implement a few parts", rather than "We did what we like and can do."

> I think your comparison here is quite unfair. We work hard to add
> overhead in hot paths where it might cost, but the FDW case involves a
> network round-trip anyway, so the cost of an if-statement would surely
> be insignificant. I feel like you want to assume without any evidence
> that a local resolver can never be quick enough, even thought the cost
> of IPC between local processes shouldn't be that high compared to a
> network round trip. But you also want to suppose that we can run code
> that might fail late in the commit process even though there is lots
> of evidence that this will cause problems, starting with the code
> comments that clearly say so.

There may be better examples.  What I wanted to say is just that I believe it's 
not PG developers' standard to allow serial prepare and commit.  Let's make it 
clear what's difficult to do the 2PC from each client session in normal 
operation without going through the resolver.

Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-16 Thread Robert Haas

On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.ta...@fujitsu.com
 wrote:
> Postgres can do that, but other implementations can not necessaily do it, I'm 
> afraid.  But before that, the FDW interface documentation doesn't describe 
> anything about how to handle interrupts.  Actually, odbc_fdw and possibly 
> other FDWs don't respond to interrupts.

Well, I'd consider that a bug.

> What we're talking here is mainly whether commit should return success or 
> failure when some participants failed to commit in the second phase of 2PC.  
> That's new to Postgres, isn't it?  Anyway, we should respect existing 
> relevant specifications and (well-known) implementations before we conclude 
> that we have to devise our own behavior.

Sure ... but we can only decide to do things that the implementation
can support, and running code that might fail after we've committed
locally isn't one of them.

> We talked about that, and unfortunately, I haven't seen a good and feasible 
> idea to enhance the current approach that involves the resolver from the 
> beginning of 2PC processing.  Honestly, I don't understand why such a "one 
> prepare, one commit in turn" serialization approach can be allowed in 
> PostgreSQL where developers pursue best performance and even tries to refrain 
> from adding an if statement in a hot path.  As I showed and Ikeda-san said, 
> other implementations have each client session send prepare and commit 
> requests.  That's a natural way to achieve reasonable concurrency and 
> performance.

I think your comparison here is quite unfair. We work hard to add
overhead in hot paths where it might cost, but the FDW case involves a
network round-trip anyway, so the cost of an if-statement would surely
be insignificant. I feel like you want to assume without any evidence
that a local resolver can never be quick enough, even thought the cost
of IPC between local processes shouldn't be that high compared to a
network round trip. But you also want to suppose that we can run code
that might fail late in the commit process even though there is lots
of evidence that this will cause problems, starting with the code
comments that clearly say so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-15 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> Well, we're talking about running this commit routine from within
> CommitTransaction(), right? So I think it is in fact running in the
> server. And if that's so, then you have to worry about how to make it
> respond to interrupts. You can't just call some functions
> DBMSX_xa_commit() and wait for infinite time for it to return. Look at
> pgfdw_get_result() for an example of what real code to do this looks
> like.

Postgres can do that, but other implementations can not necessaily do it, I'm 
afraid.  But before that, the FDW interface documentation doesn't describe 
anything about how to handle interrupts.  Actually, odbc_fdw and possibly other 
FDWs don't respond to interrupts.


> Honestly, I am not quite sure what any specification has to say about
> this. We're talking about what happens when a user does something with
> a foreign table and then type COMMIT. That's all about providing a set
> of behaviors that are consistent with how PostgreSQL works in other
> situations. You can't negotiate away the requirement to handle errors
> in a way that works with PostgreSQL's infrastructure, or the
> requirement that any length operation handle interrupts properly, by
> appealing to a specification.

What we're talking here is mainly whether commit should return success or 
failure when some participants failed to commit in the second phase of 2PC.  
That's new to Postgres, isn't it?  Anyway, we should respect existing relevant 
specifications and (well-known) implementations before we conclude that we have 
to devise our own behavior.


> That sounds more like a limitation of the present implementation than
> a fundamental problem. We shouldn't reject the idea of having a
> resolver process handle this just because the initial implementation
> might be slow. If there's no fundamental problem with the idea,
> parallelism and concurrency can be improved in separate patches at a
> later time. It's much more important at this stage to reject ideas
> that are not theoretically sound.

We talked about that, and unfortunately, I haven't seen a good and feasible 
idea to enhance the current approach that involves the resolver from the 
beginning of 2PC processing.  Honestly, I don't understand why such a "one 
prepare, one commit in turn" serialization approach can be allowed in 
PostgreSQL where developers pursue best performance and even tries to refrain 
from adding an if statement in a hot path.  As I showed and Ikeda-san said, 
other implementations have each client session send prepare and commit 
requests.  That's a natural way to achieve reasonable concurrency and 
performance.


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-14 Thread Robert Haas

On Sun, Jun 13, 2021 at 10:04 PM tsunakawa.ta...@fujitsu.com
 wrote:
> I know sending a commit request may get an error from various underlying 
> functions, but we're talking about the client side, not the Postgres's server 
> side that could unexpectedly ereport(ERROR) somewhere.  So, the new FDW 
> commit routine won't lose control and can return an error code as its return 
> value.  For instance, the FDW commit routine for DBMS-X would typically be:
>
> int
> DBMSXCommit(...)
> {
> int ret;
>
> /* extract info from the argument to pass to xa_commit() */
>
> ret = DBMSX_xa_commit(...);
> /* This is the actual commit function which is exposed to the app 
> server (e.g. Tuxedo) through the xa_commit() interface */
>
> /* map xa_commit() return values to the corresponding return values 
> of the FDW commit routine */
> switch (ret)
> {
> case XA_RMERR:
> ret = ...;
> break;
> ...
> }
>
> return ret;
> }

Well, we're talking about running this commit routine from within
CommitTransaction(), right? So I think it is in fact running in the
server. And if that's so, then you have to worry about how to make it
respond to interrupts. You can't just call some functions
DBMSX_xa_commit() and wait for infinite time for it to return. Look at
pgfdw_get_result() for an example of what real code to do this looks
like.

> So, we need to design how commit behaves from the user's perspective.  That's 
> the functional design.  We should figure out what's the desirable response of 
> commit first, and then see if we can implement it or have to compromise in 
> some way.  I think we can reference the X/Open TX standard and/or JTS (Java 
> Transaction Service) specification (I haven't had a chance to read them yet, 
> though.)  Just in case we can't find the requested commit behavior in the 
> volcano case from those specifications, ... (I'm hesitant to say this because 
> it may be hard,) it's desirable to follow representative products such as 
> Tuxedo and GlassFish (the reference implementation of Java EE specs.)

Honestly, I am not quite sure what any specification has to say about
this. We're talking about what happens when a user does something with
a foreign table and then type COMMIT. That's all about providing a set
of behaviors that are consistent with how PostgreSQL works in other
situations. You can't negotiate away the requirement to handle errors
in a way that works with PostgreSQL's infrastructure, or the
requirement that any length operation handle interrupts properly, by
appealing to a specification.

> Concurrent transactions are serialized at the resolver.  I heard that the 
> current patch handles 2PC like this: the TM (transaction manager in Postgres 
> core) requests prepare to the resolver, the resolver sends prepare to the 
> remote server and wait for reply, the TM gets back control from the resolver, 
> TM requests commit to the resolver, the resolver sends commit to the remote 
> server and wait for reply, and TM gets back control.  The resolver handles 
> one transaction at a time.

That sounds more like a limitation of the present implementation than
a fundamental problem. We shouldn't reject the idea of having a
resolver process handle this just because the initial implementation
might be slow. If there's no fundamental problem with the idea,
parallelism and concurrency can be improved in separate patches at a
later time. It's much more important at this stage to reject ideas
that are not theoretically sound.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-13 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com
>  wrote:
> > The question I have been asking is how.  With that said, we should only have
> two options; one is the return value of the FDW commit routine, and the other 
> is
> via ereport(ERROR).  I suggested the possibility of the former, because if the
> FDW does ereport(ERROR), Postgres core (transaction manager) may have
> difficulty in handling the rest of the participants.
> 
> I don't think that is going to work. It is very difficult to write
> code that doesn't ever ERROR in PostgreSQL. It is not impossible if
> the operation is trivial enough, but I think you're greatly
> underestimating the complexity of committing the remote transaction.
> If somebody had designed PostgreSQL so that every function returns a
> return code and every time you call some other function you check that
> return code and pass any error up to your own caller, then there would
> be no problem here. But in fact the design was that at the first sign
> of trouble you throw an ERROR. It's not easy to depart from that
> programming model in just one place.

> > I'm not completely sure about this.  I thought (and said) that the only 
> > thing
> the FDW does would be to send a commit request through an existing
> connection.  So, I think it's not a severe restriction to require FDWs to do
> ereport(ERROR) during commits (of the second phase of 2PC.)
> 
> To send a commit request through an existing connection, you have to
> send some bytes over the network using a send() or write() system
> call. That can fail. Then you have to read the response back over the
> network using recv() or read(). That can also fail. You also need to
> parse the result that you get from the remote side, which can also
> fail, because you could get back garbage for some reason. And
> depending on the details, you might first need to construct the
> message you're going to send, which might be able to fail too. Also,
> the data might be encrypted using SSL, so you might have to decrypt
> it, which can also fail, and you might need to encrypt data before
> sending it, which can fail. In fact, if you're using the OpenSSL,
> trying to call SSL_read() or SSL_write() can both read and write data
> from the socket, even multiple times, so you have extra opportunities
> to fail.

I know sending a commit request may get an error from various underlying 
functions, but we're talking about the client side, not the Postgres's server 
side that could unexpectedly ereport(ERROR) somewhere.  So, the new FDW commit 
routine won't lose control and can return an error code as its return value.  
For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
int ret;

/* extract info from the argument to pass to xa_commit() */

ret = DBMSX_xa_commit(...);
/* This is the actual commit function which is exposed to the app 
server (e.g. Tuxedo) through the xa_commit() interface */

/* map xa_commit() return values to the corresponding return values of 
the FDW commit routine */
switch (ret)
{
case XA_RMERR:
ret = ...;
break;
...
}

return ret;
}


> I think that's a valid concern, but we also have to have a plan that
> is realistic. Some things are indeed not possible in PostgreSQL's
> design. Also, some of these problems are things everyone has to
> somehow confront. There's no database doing 2PC that can't have a
> situation where one of the machines disappears unexpectedly due to
> some natural disaster or administrator interference. It might be the
> case that our inability to do certain things safely during transaction
> commit puts us out of compliance with the spec, but it can't be the
> case that some other system has no possible failures during
> transaction commit. The problem of the network potentially being
> disconnected between one packet and the next exists in every system.

So, we need to design how commit behaves from the user's perspective.  That's 
the functional design.  We should figure out what's the desirable response of 
commit first, and then see if we can implement it or have to compromise in some 
way.  I think we can reference the X/Open TX standard and/or JTS (Java 
Transaction Service) specification (I haven't had a chance to read them yet, 
though.)  Just in case we can't find the requested commit behavior in the 
volcano case from those specifications, ... (I'm hesitant to say this because 
it may be hard,) it's desirable to follow representative products such as 
Tuxedo and GlassFish (the reference implementation of Java EE specs.)


> > I don't think the resolver-based approach would bring us far enough.  It's
> fundamentally a bottleneck.  Such a background process should only handle
> commits whose requests failed to be sent due to server down.
> 
> Why is it fundamentally a

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-11 Thread Fujii Masao





On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.


Thanks for updating the patches!

I'm still reading these patches, but I'd like to share some review comments
that I found so far.

(1)
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+   Assert(IsTransactionState());
+   RemoveFdwXactEntry(usermapping->umid);
+}

Currently there is no user of FdwXactUnregisterXact().
This function should be removed?


(2)
When I ran the regression test, I got the following failure.

= Contents of ./src/test/modules/test_fdwxact/regression.diffs
diff -U3 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
--- 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
 2021-06-10 02:19:43.808622747 +
+++ 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
  2021-06-10 02:29:53.452410462 +
@@ -174,7 +174,7 @@
 SELECT count(*) FROM pg_foreign_xacts;
  count
 ---
- 1
+ 4
 (1 row)


(3)
+errmsg("could not read foreign transaction state 
from xlog at %X/%X",
+   (uint32) (lsn >> 32),
+   (uint32) lsn)));

LSN_FORMAT_ARGS() should be used?


(4)
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+   int len);

Since RecreateFdwXactFile() is used only in fdwxact.c,
the above "extern" is not necessary?


(5)
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.

So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.


(6)
XLogFlush() is called for each foreign transaction. So if there are many
foreign transactions, XLogFlush() is called too frequently. Which might
cause unnecessary performance overhead? Instead, for example,
we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
after inserting all WAL records for all foreign transactions?


(7)
/* Open connection; report that we'll create a prepared statement. */
fmstate->conn = GetConnection(user, true, >conn_state);
+   MarkConnectionModified(user);

MarkConnectionModified() should be called also when TRUNCATE on
a foreign table is executed?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-11 Thread Robert Haas

On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com
 wrote:
> I understand that.  As I cited yesterday and possibly before, that's why 
> xa_commit() returns various return codes.  So, I have never suggested that 
> FDWs should not report an error and always report success for the commit 
> request.  They should be allowed to report an error.

In the text to which I was responding it seemed like you were saying
the opposite. Perhaps I misunderstood.

> The question I have been asking is how.  With that said, we should only have 
> two options; one is the return value of the FDW commit routine, and the other 
> is via ereport(ERROR).  I suggested the possibility of the former, because if 
> the FDW does ereport(ERROR), Postgres core (transaction manager) may have 
> difficulty in handling the rest of the participants.

I don't think that is going to work. It is very difficult to write
code that doesn't ever ERROR in PostgreSQL. It is not impossible if
the operation is trivial enough, but I think you're greatly
underestimating the complexity of committing the remote transaction.
If somebody had designed PostgreSQL so that every function returns a
return code and every time you call some other function you check that
return code and pass any error up to your own caller, then there would
be no problem here. But in fact the design was that at the first sign
of trouble you throw an ERROR. It's not easy to depart from that
programming model in just one place.

> > Also, leaving aside theoretical arguments, I think it's not
> > realistically possible for an FDW author to write code to commit a
> > prepared transaction that will be safe in the context of running late
> > in PrepareTransaction(), after we've already done
> > RecordTransactionCommit(). Such code can't avoid throwing errors
> > because it can't avoid performing operations and allocating memory.
>
> I'm not completely sure about this.  I thought (and said) that the only thing 
> the FDW does would be to send a commit request through an existing 
> connection.  So, I think it's not a severe restriction to require FDWs to do 
> ereport(ERROR) during commits (of the second phase of 2PC.)

To send a commit request through an existing connection, you have to
send some bytes over the network using a send() or write() system
call. That can fail. Then you have to read the response back over the
network using recv() or read(). That can also fail. You also need to
parse the result that you get from the remote side, which can also
fail, because you could get back garbage for some reason. And
depending on the details, you might first need to construct the
message you're going to send, which might be able to fail too. Also,
the data might be encrypted using SSL, so you might have to decrypt
it, which can also fail, and you might need to encrypt data before
sending it, which can fail. In fact, if you're using the OpenSSL,
trying to call SSL_read() or SSL_write() can both read and write data
from the socket, even multiple times, so you have extra opportunities
to fail.

> (I took "abort" as the same as "rollback" here.)  Once we've sent commit 
> requests to some participants, we can't abort the transaction.  If one FDW 
> returned an error halfway, we need to send commit requests to the rest of 
> participants.

I understand that it's not possible to abort the local transaction to
abort after it's been committed, but that doesn't mean that we're
going to be able to send the commit requests to the rest of the
participants. We want to be able to do that, certainly, but there's no
guarantee that it's actually possible. Again, the remote servers may
be dropped into a volcano, or less seriously, we may not be able to
access them. Also, someone may kill off our session.

> It's a design question, as I repeatedly said, whether and how we should 
> report the error of some participants to the client.  For instance, how 
> should we report the errors of multiple participants?  Concatenate those 
> error messages?

Sure, I agree that there are some questions about how to report errors.

> Anyway, we should design the interface first, giving much thought and 
> respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS).  Otherwise, we 
> may end up like "We implemented like this, so the interface is like this and 
> it can only behave like this, although you may find it strange..."  That 
> might be a situation similar to what your comment "the design of PostgreSQL, 
> in all circumstances, the way you recover from an error is to abort the 
> transaction" suggests -- Postgres doesn't have statement-level rollback.

I think that's a valid concern, but we also have to have a plan that
is realistic. Some things are indeed not possible in PostgreSQL's
design. Also, some of these problems are things everyone has to
somehow confront. There's no database doing 2PC that can't have a
situation where one of the machines disappears unexpectedly due to
some natural disaster or

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-10 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> That is completely unrealistic. As Sawada-san has pointed out
> repeatedly, there are tons of things that can go wrong even after the
> remote side has prepared the transaction. Preparing a transaction only
> promises that the remote side will let you commit the transaction upon
> request. It doesn't guarantee that you'll be able to make the request.
> Like Sawada-san says, network problems, out of memory issues, or many
> other things could stop that from happening. Someone could come along
> in another session and run "ROLLBACK PREPARED" on the remote side, and
> now the "COMMIT PREPARED" will never succeed no matter how many times
> you try it. At least, not unless someone goes and creates a new
> prepared transaction with the same 2PC identifier, but then you won't
> be committing the correct transaction anyway. Or someone could take
> the remote server and drop it in a volcano. How do you propose that we
> avoid giving the user an error after the remote server has been
> dropped into a volcano, even though the local node is still alive?

I understand that.  As I cited yesterday and possibly before, that's why 
xa_commit() returns various return codes.  So, I have never suggested that FDWs 
should not report an error and always report success for the commit request.  
They should be allowed to report an error.

The question I have been asking is how.  With that said, we should only have 
two options; one is the return value of the FDW commit routine, and the other 
is via ereport(ERROR).  I suggested the possibility of the former, because if 
the FDW does ereport(ERROR), Postgres core (transaction manager) may have 
difficulty in handling the rest of the participants.


> Also, leaving aside theoretical arguments, I think it's not
> realistically possible for an FDW author to write code to commit a
> prepared transaction that will be safe in the context of running late
> in PrepareTransaction(), after we've already done
> RecordTransactionCommit(). Such code can't avoid throwing errors
> because it can't avoid performing operations and allocating memory.

I'm not completely sure about this.  I thought (and said) that the only thing 
the FDW does would be to send a commit request through an existing connection.  
So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) 
during commits (of the second phase of 2PC.)


> It's already been mentioned that, if an ERROR is thrown, it would be
> reported to the user in place of the COMMIT acknowledgement that they
> are expecting. Now, it has also been suggested that we could downgrade
> the ERROR to a WARNING and still report the COMMIT. That doesn't sound
> easy to do, because when the ERROR happens, control is going to jump
> to AbortTransaction(). But even if you could hack it so it works like
> that, it doesn't really solve the problem. What about all of the other
> servers where the prepared transaction also needs to be committed? In
> the design of PostgreSQL, in all circumstances, the way you recover
> from an error is to abort the transaction. That is what brings the
> system back to a clean state. You can't simply ignore the requirement
> to abort the transaction and keep doing more work. It will never be
> reliable, and Tom will instantaneously demand that any code works like
> that be reverted -- and for good reason.

(I took "abort" as the same as "rollback" here.)  Once we've sent commit 
requests to some participants, we can't abort the transaction.  If one FDW 
returned an error halfway, we need to send commit requests to the rest of 
participants.

It's a design question, as I repeatedly said, whether and how we should report 
the error of some participants to the client.  For instance, how should we 
report the errors of multiple participants?  Concatenate those error messages?

Anyway, we should design the interface first, giving much thought and 
respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS).  Otherwise, we 
may end up like "We implemented like this, so the interface is like this and it 
can only behave like this, although you may find it strange..."  That might be 
a situation similar to what your comment "the design of PostgreSQL, in all 
circumstances, the way you recover from an error is to abort the transaction" 
suggests -- Postgres doesn't have statement-level rollback.


> I am not sure that it's 100% impossible to find a way to solve this
> problem without just having the resolver do all the work, but I think
> it's going to be extremely difficult. We tried to figure out some
> vaguely similar things while working on undo, and it really didn't go
> very well. The later stages of CommitTransaction() and
> AbortTransaction() are places where very few kinds of code are safe to
> execute, and finding a way to patch around that problem is not simple
> either. If the resolver performance is poor, perhaps we could try to
> find a way to improve it. I don't know. But I don't think

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-10 Thread Robert Haas

On Fri, Jun 4, 2021 at 4:04 AM tsunakawa.ta...@fujitsu.com
 wrote:
> Why does the user have to get an error?  Once the local transaction has been 
> prepared, which means all remote ones also have been prepared, the whole 
> transaction is determined to commit.  So, the user doesn't have to receive an 
> error as long as the local node is alive.

That is completely unrealistic. As Sawada-san has pointed out
repeatedly, there are tons of things that can go wrong even after the
remote side has prepared the transaction. Preparing a transaction only
promises that the remote side will let you commit the transaction upon
request. It doesn't guarantee that you'll be able to make the request.
Like Sawada-san says, network problems, out of memory issues, or many
other things could stop that from happening. Someone could come along
in another session and run "ROLLBACK PREPARED" on the remote side, and
now the "COMMIT PREPARED" will never succeed no matter how many times
you try it. At least, not unless someone goes and creates a new
prepared transaction with the same 2PC identifier, but then you won't
be committing the correct transaction anyway. Or someone could take
the remote server and drop it in a volcano. How do you propose that we
avoid giving the user an error after the remote server has been
dropped into a volcano, even though the local node is still alive?

Also, leaving aside theoretical arguments, I think it's not
realistically possible for an FDW author to write code to commit a
prepared transaction that will be safe in the context of running late
in PrepareTransaction(), after we've already done
RecordTransactionCommit(). Such code can't avoid throwing errors
because it can't avoid performing operations and allocating memory.
It's already been mentioned that, if an ERROR is thrown, it would be
reported to the user in place of the COMMIT acknowledgement that they
are expecting. Now, it has also been suggested that we could downgrade
the ERROR to a WARNING and still report the COMMIT. That doesn't sound
easy to do, because when the ERROR happens, control is going to jump
to AbortTransaction(). But even if you could hack it so it works like
that, it doesn't really solve the problem. What about all of the other
servers where the prepared transaction also needs to be committed? In
the design of PostgreSQL, in all circumstances, the way you recover
from an error is to abort the transaction. That is what brings the
system back to a clean state. You can't simply ignore the requirement
to abort the transaction and keep doing more work. It will never be
reliable, and Tom will instantaneously demand that any code works like
that be reverted -- and for good reason.

I am not sure that it's 100% impossible to find a way to solve this
problem without just having the resolver do all the work, but I think
it's going to be extremely difficult. We tried to figure out some
vaguely similar things while working on undo, and it really didn't go
very well. The later stages of CommitTransaction() and
AbortTransaction() are places where very few kinds of code are safe to
execute, and finding a way to patch around that problem is not simple
either. If the resolver performance is poor, perhaps we could try to
find a way to improve it. I don't know. But I don't think it does any
good to say, well, no errors can occur after the remote transaction is
prepared. That's clearly incorrect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-10 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> If we accept each elementary-commit (via FDW connection) to fail, the
> parent(?) there's no way the root 2pc-commit can succeed.  How can we
> ignore the fdw-error in that case?

No, we don't ignore the error during FDW commit.  As mentioned at the end of 
this mail, the question is how the FDW reports the eror to the caller 
(transaction  manager in Postgres core), and how we should handle it.

As below, Glassfish catches the resource manager's error during commit, retries 
the commit if the error is transient or communication failure, and hands off 
the processing of failed commit to the recovery manager.  (I used all of my 
energy today; I'd be grateful if someone could figure out whether Glassfish 
reports the error to the application.)


[XATerminatorImpl.java]
public void commit(Xid xid, boolean onePhase) throws XAException {
...
} else {
coord.commit();
}


[TopCoordinator.java]
// Commit all participants.  If a fatal error occurs during
// this method, then the process must be ended with a fatal error.
...
try {
participants.distributeCommit();
} catch (Throwable exc) {


[RegisteredResources.java]
void distributeCommit() throws HeuristicMixed, HeuristicHazard, NotPrepared 
{
...
// Browse through the participants, committing them. The following is
// intended to be done asynchronously as a group of operations.
...
// Tell the resource to commit.
// Catch any exceptions here; keep going until
// no exception is left.
...
// If the exception is neither TRANSIENT or
// COMM_FAILURE, it is unexpected, so display a
// message and give up with this Resource.
...
// For TRANSIENT or COMM_FAILURE, wait
// for a while, then retry the commit.
...
// If the retry limit has been exceeded,
// end the process with a fatal error.
...
if (!transactionCompleted) {
if (coord != null)
RecoveryManager.addToIncompleTx(coord, true);


> > No.  Taking the description literally and considering the relevant XA
> specification, it's not about the remote commit failure.  The remote server is
> not allowed to fail the commit once it has reported successful prepare, which 
> is
> the contract of 2PC.  HeuristicMixedException is about the manual resolution,
> typically by the DBA, using the DBMS-specific tool or the standard
> commit()/rollback() API.
> 
> Mmm. The above seems as if saying that 2pc-comit does not interact
> with remotes.  The interface contract does not cover everything that
> happens in the real world. If remote-commit fails, that is just an
> issue outside of the 2pc world.  In reality remote-commit may fail for
> all reasons.

The following part of XA specification is relevant.  We're considering to model 
the FDW 2PC interface based on XA, because it seems like the only standard 
interface and thus other FDWS would naturally take advantage of, aren't we?  
Then, we need to take care of such things as this.  The interface design is not 
easy.  So, proper design and its review should come first, before going deeper 
into the huge code patch.

2.3.3 Heuristic Branch Completion 
--
Some RMs may employ heuristic decision-making: an RM that has prepared to 
commit a transaction branch may decide to commit or roll back its work 
independently 
of the TM. It could then unlock shared resources. This may leave them in an 
inconsistent state. When the TM ultimately directs an RM to complete the 
branch, the 
RM may respond that it has already done so. The RM reports whether it committed 
the branch, rolled it back, or completed it with mixed results (committed some 
work 
and rolled back other work). 

An RM that reports heuristic completion to the TM must not discard its 
knowledge of 
the transaction branch. The TM calls the RM once more to authorise it to forget 
the 
branch. This requirement means that the RM must notify the TM of all heuristic 
decisions, even those that match the decision the TM requested. The referenced 
OSI DTP specifications (model) and (service) define heuristics more precisely. 
--


> https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distr
> ibuted-transaction-that-uses-jta-methods
> This suggests that both XAResoruce.prepare() and commit() can throw a
> exception.

Yes, XAResource() can throw an exception:

void commit(Xid xid, boolean onePhase) throws XAException 

Throws: XAException 
An error has occurred. Possible XAExceptions are XA_HEURHAZ, XA_HEURCOM, 
XA_HEURRB, XA_HEURMIX, XAER_RMERR, XAER_RMFAIL, XAER_NOTA, 
XAER_INVAL, or XAER_PROTO.

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread Kyotaro Horiguchi

At Tue, 8 Jun 2021 08:45:24 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Kyotaro Horiguchi 
> > I think the discussion is based the behavior that any process that is
> > responsible for finishing the 2pc-commit continue retrying remote
> > commits until all of the remote-commits succeed.
> 
> Thank you for coming back.  We're talking about the first attempt to prepare 
> and commit in each transaction, not the retry case.

If we accept each elementary-commit (via FDW connection) to fail, the
parent(?) there's no way the root 2pc-commit can succeed.  How can we
ignore the fdw-error in that case?

> > > Throws: HeuristicMixedException
> > > Thrown to indicate that a heuristic decision was made and that some
> > relevant updates have been
> > > committed while others have been rolled back.
> 
> > I'm not sure about how JTA works in detail, but doesn't
> > UserTransaction.commit() return HeuristicMixedExcpetion when some of
> > relevant updates have been committed but other not? Isn't it the same
> > state with the case where some of the remote servers failed on
> > remote-commit while others are succeeded?
> 
> No.  Taking the description literally and considering the relevant XA 
> specification, it's not about the remote commit failure.  The remote server 
> is not allowed to fail the commit once it has reported successful prepare, 
> which is the contract of 2PC.  HeuristicMixedException is about the manual 
> resolution, typically by the DBA, using the DBMS-specific tool or the 
> standard commit()/rollback() API.

Mmm. The above seems as if saying that 2pc-comit does not interact
with remotes.  The interface contract does not cover everything that
happens in the real world. If remote-commit fails, that is just an
issue outside of the 2pc world.  In reality remote-commit may fail for
all reasons.

https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distributed-transaction-that-uses-jta-methods

>  }  catch (javax.transaction.xa.XAException xae)
>  { // Distributed transaction failed, so roll it back.
>// Report XAException on prepare/commit.

This suggests that both XAResoruce.prepare() and commit() can throw a
exception.

> > (I guess that
> > UserTransaction.commit() would throw RollbackException if
> > remote-prepare has been failed for any of the remotes.)
> 
> Correct.

So UserTransaction.commit() does not throw the same exception if
remote-commit fails.  Isn't the HeuristicMixedExcpetion the exception
thrown in that case?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> Maybe it's better to start a new thread to discuss this topic. If your
> idea is good, we can lower all error that happened after writing the
> commit record to warning, reducing the cases where the client gets
> confusion by receiving an error after the commit.

No.  It's an important part because it determines the 2PC behavior and 
performance.  This discussion had started from the concern about performance 
before Ikeda-san reported pathological results.  Don't rush forward, hoping 
someone will commit the current patch.  I'm afraid you just don't want to 
change your design and code.  Let's face the real issue.

As I said before, and as Ikeda-san's performance benchmark results show, I have 
to say the design isn't done sufficiently.  I talked with Fujii-san the other 
day about this patch.  The patch is already huge and it's difficult to decode 
how the patch works, e.g., what kind of new WALs it emits, how many disk writes 
it adds, how the error is handled, whether/how it's different from the textbook 
or other existing designs, etc.  What happend to my request to add such design 
description to the following page, so that reviewers can consider the design 
before spending much time on looking at the code?  What's the situation of the 
new FDW API that should naturally accommodate other FDW implementations?

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Design should come first.  I don't think it's a sincere attitude to require 
reviewers to spend long time to read the design from huge code.


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread Masahiko Sawada

On Wed, Jun 9, 2021 at 4:10 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.ta...@fujitsu.com
> >  wrote:
> > > Then, in what kind of scenario are we talking about the difficulty, and 
> > > how is
> > it difficult to handle, when we adopt either the method 1 or 2?  (I'd just 
> > like to
> > have the same clear picture.)
> >
> > IMO, even though FDW's commit/rollback transaction code could be
> > simple in some cases, I think we need to think that any kind of errors
> > (or even FATAL or PANIC) could be thrown from the FDW code. It could
> > be an error due to a temporary network problem, remote server down,
> > driver’s unexpected error, or out of memory etc. Errors that happened
> > after the local transaction commit doesn't affect the global
> > transaction decision, as you mentioned. But the proccess or system
> > could be in a bad state. Also, users might expect the process to exit
> > on error by setting  exit_on_error = on. Your idea sounds like that we
> > have to ignore any errors happening after the local commit if they
> > don’t affect the transaction outcome. It’s too scary to me and I think
> > that it's a bad idea to blindly ignore all possible errors under such
> > conditions. That could make the thing worse and will likely be
> > foot-gun. It would be good if we can prove that it’s safe to ignore
> > those errors but not sure how we can at least for me.
> >
> > This situation is true even today; an error could happen after
> > committing the transaction. But I personally don’t want to add the
> > code that increases the likelihood.
>
> I'm not talking about the code simplicity here (actually, I haven't reviewed 
> the code around prepare and commit in the patch yet...)  Also, I don't 
> understand well what you're trying to insist and what realistic situations 
> you have in mind by citing exit_on_error, FATAL, PANIC and so on.  I just 
> asked (in a different part) why the client has to know the error.
>
> Just to be clear, I'm not saying that we should hide the error completely 
> behind the scenes.  For example, you can allow the FDW to emit a WARNING if 
> the DBMS-specific client driver returns an error when committing.  Further, 
> if you want to allow the FDW to throw an ERROR when committing, the 
> transaction manager in core can catch it by PG_TRY(), so that it can report 
> back successfull commit of the global transaction to the client while it 
> leaves the handling of failed commit of the FDW to the resolver.  (I don't 
> think we like to use PG_TRY() during transaction commit for performance 
> reasons, though.)
>
> Let's give it a hundred steps and let's say we want to report the error of 
> the committing FDW to the client.  If that's the case, we can use SQLSTATE 
> 02xxx (Warning) and attach the error message.
>

Maybe it's better to start a new thread to discuss this topic. If your
idea is good, we can lower all error that happened after writing the
commit record to warning, reducing the cases where the client gets
confusion by receiving an error after the commit.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.ta...@fujitsu.com
>  wrote:
> > Then, in what kind of scenario are we talking about the difficulty, and how 
> > is
> it difficult to handle, when we adopt either the method 1 or 2?  (I'd just 
> like to
> have the same clear picture.)
> 
> IMO, even though FDW's commit/rollback transaction code could be
> simple in some cases, I think we need to think that any kind of errors
> (or even FATAL or PANIC) could be thrown from the FDW code. It could
> be an error due to a temporary network problem, remote server down,
> driver’s unexpected error, or out of memory etc. Errors that happened
> after the local transaction commit doesn't affect the global
> transaction decision, as you mentioned. But the proccess or system
> could be in a bad state. Also, users might expect the process to exit
> on error by setting  exit_on_error = on. Your idea sounds like that we
> have to ignore any errors happening after the local commit if they
> don’t affect the transaction outcome. It’s too scary to me and I think
> that it's a bad idea to blindly ignore all possible errors under such
> conditions. That could make the thing worse and will likely be
> foot-gun. It would be good if we can prove that it’s safe to ignore
> those errors but not sure how we can at least for me.
> 
> This situation is true even today; an error could happen after
> committing the transaction. But I personally don’t want to add the
> code that increases the likelihood.

I'm not talking about the code simplicity here (actually, I haven't reviewed 
the code around prepare and commit in the patch yet...)  Also, I don't 
understand well what you're trying to insist and what realistic situations you 
have in mind by citing exit_on_error, FATAL, PANIC and so on.  I just asked (in 
a different part) why the client has to know the error.

Just to be clear, I'm not saying that we should hide the error completely 
behind the scenes.  For example, you can allow the FDW to emit a WARNING if the 
DBMS-specific client driver returns an error when committing.  Further, if you 
want to allow the FDW to throw an ERROR when committing, the transaction 
manager in core can catch it by PG_TRY(), so that it can report back 
successfull commit of the global transaction to the client while it leaves the 
handling of failed commit of the FDW to the resolver.  (I don't think we like 
to use PG_TRY() during transaction commit for performance reasons, though.)

Let's give it a hundred steps and let's say we want to report the error of the 
committing FDW to the client.  If that's the case, we can use SQLSTATE 02xxx 
(Warning) and attach the error message.

> Just to be clear, with your idea, we will ignore only ERROR or also
> FATAL and PANIC? And if an error happens during committing one of the
> prepared transactions on the foreign server, will we proceed with
> committing other transactions or return OK to the client?

Neither FATAL nor PANIC can be ignored.  When FATAL, which means the 
termination of a particular session, the committing of the remote transaction 
should be taken over by the resolver.  Not to mention PANIC; we can't do 
anything.  Otherwise, we proceed with committing other FDWs, hand off the task 
of committing the failed FDW to the resolver, and report success to the client. 
 If you're not convinced, I'd like to ask you to investigate the code of some 
Java EE app server, say GlassFish, and share with us how it handles an error 
during commit.

Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread Masahiko Sawada

On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.ta...@fujitsu.com
> >  wrote:
> > > Why does the client have to know the error on a remote server, whereas the
> > global transaction itself is destined to commit?
> >
> > It's not necessarily on a remote server. It could be a problem with
> > the local server.
>
> Then, in what kind of scenario are we talking about the difficulty, and how 
> is it difficult to handle, when we adopt either the method 1 or 2?  (I'd just 
> like to have the same clear picture.)

IMO, even though FDW's commit/rollback transaction code could be
simple in some cases, I think we need to think that any kind of errors
(or even FATAL or PANIC) could be thrown from the FDW code. It could
be an error due to a temporary network problem, remote server down,
driver’s unexpected error, or out of memory etc. Errors that happened
after the local transaction commit doesn't affect the global
transaction decision, as you mentioned. But the proccess or system
could be in a bad state. Also, users might expect the process to exit
on error by setting  exit_on_error = on. Your idea sounds like that we
have to ignore any errors happening after the local commit if they
don’t affect the transaction outcome. It’s too scary to me and I think
that it's a bad idea to blindly ignore all possible errors under such
conditions. That could make the thing worse and will likely be
foot-gun. It would be good if we can prove that it’s safe to ignore
those errors but not sure how we can at least for me.

This situation is true even today; an error could happen after
committing the transaction. But I personally don’t want to add the
code that increases the likelihood.

Just to be clear, with your idea, we will ignore only ERROR or also
FATAL and PANIC? And if an error happens during committing one of the
prepared transactions on the foreign server, will we proceed with
committing other transactions or return OK to the client?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> I think the discussion is based the behavior that any process that is
> responsible for finishing the 2pc-commit continue retrying remote
> commits until all of the remote-commits succeed.

Thank you for coming back.  We're talking about the first attempt to prepare 
and commit in each transaction, not the retry case.


> > Throws: HeuristicMixedException
> > Thrown to indicate that a heuristic decision was made and that some
> relevant updates have been
> > committed while others have been rolled back.

> I'm not sure about how JTA works in detail, but doesn't
> UserTransaction.commit() return HeuristicMixedExcpetion when some of
> relevant updates have been committed but other not? Isn't it the same
> state with the case where some of the remote servers failed on
> remote-commit while others are succeeded?

No.  Taking the description literally and considering the relevant XA 
specification, it's not about the remote commit failure.  The remote server is 
not allowed to fail the commit once it has reported successful prepare, which 
is the contract of 2PC.  HeuristicMixedException is about the manual 
resolution, typically by the DBA, using the DBMS-specific tool or the standard 
commit()/rollback() API.


> (I guess that
> UserTransaction.commit() would throw RollbackException if
> remote-prepare has been failed for any of the remotes.)

Correct.


Regards
Takayuki Tsunakawa

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.ta...@fujitsu.com
>  wrote:
> > Why does the client have to know the error on a remote server, whereas the
> global transaction itself is destined to commit?
> 
> It's not necessarily on a remote server. It could be a problem with
> the local server.

Then, in what kind of scenario are we talking about the difficulty, and how is 
it difficult to handle, when we adopt either the method 1 or 2?  (I'd just like 
to have the same clear picture.)  For example,

1. All FDWs prepared successfully.
2. The local transaction prepared successfully, too.
3. Some FDWs committed successfully.
4. One FDW failed to send the commit request because the remote server went 
down.

Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread Kyotaro Horiguchi

At Tue, 8 Jun 2021 16:32:14 +0900, Masahiko Sawada  
wrote in 
> On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.ta...@fujitsu.com
>  wrote:
> >
> > From: Masahiko Sawada 
> > > I think we should not reinterpret the severity of the error and lower
> > > it. Especially, in this case, any kind of errors can be thrown. It
> > > could be such a serious error that FDW developer wants to report to
> > > the client. Do we lower even PANIC to a lower severity such as
> > > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
> > > lowering ERROR (and FATAL) to WARNING, why do we regard only them as
> > > non-error?
> >
> > Why does the client have to know the error on a remote server, whereas the 
> > global transaction itself is destined to commit?
> 
> It's not necessarily on a remote server. It could be a problem with
> the local server.

Isn't it a discussion about the errors from postgres_fdw?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread Kyotaro Horiguchi

(I have caught up here. Sorry in advance for possible pointless
discussion by me..)

At Tue, 8 Jun 2021 00:47:08 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Masahiko Sawada 
> > I think we should not reinterpret the severity of the error and lower
> > it. Especially, in this case, any kind of errors can be thrown. It
> > could be such a serious error that FDW developer wants to report to
> > the client. Do we lower even PANIC to a lower severity such as
> > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
> > lowering ERROR (and FATAL) to WARNING, why do we regard only them as
> > non-error?
> 
> Why does the client have to know the error on a remote server, whereas the 
> global transaction itself is destined to commit?

I think the discussion is based the behavior that any process that is
responsible for finishing the 2pc-commit continue retrying remote
commits until all of the remote-commits succeed.

Maybe in most cases the errors duing remote-prepared-commit could be
retry-able but as Sawada-san says I'm also not sure it's always the
case.  On the other hand, it could be said that we have no other way
than retrying the remote-commits if we want to get over, say, instant
network failures automatically.  It is somewhat similar to
WAL-restoration that continues complaining for recovery_commands
failure without exiting.

> FYI, the tx_commit() in the X/Open TX interface and the 
> UserTransaction.commit() in JTA don't return such an error, IIRC.  Do TX_FAIL 
> and SystemException serve such a purpose?  I don't feel like that.

I'm not sure about how JTA works in detail, but doesn't
UserTransaction.commit() return HeuristicMixedExcpetion when some of
relevant updates have been committed but other not? Isn't it the same
state with the case where some of the remote servers failed on
remote-commit while others are succeeded?  (I guess that
UserTransaction.commit() would throw RollbackException if
remote-prepare has been failed for any of the remotes.)


> [Tuxedo manual (Japanese)]
> https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm
> 
> 
> [JTA]
> public interface javax.transaction.UserTransaction 
> public void commit()
>  throws RollbackException, HeuristicMixedException, 
> HeuristicRollbackException, SecurityException, 
> IllegalStateException, SystemException 
> 
> Throws: RollbackException 
> Thrown to indicate that the transaction has been rolled back rather than 
> committed. 
> 
> Throws: HeuristicMixedException 
> Thrown to indicate that a heuristic decision was made and that some relevant 
> updates have been 
> committed while others have been rolled back. 
> 
> Throws: HeuristicRollbackException 
> Thrown to indicate that a heuristic decision was made and that all relevant 
> updates have been rolled 
> back. 
> 
> Throws: SecurityException 
> Thrown to indicate that the thread is not allowed to commit the transaction. 
> 
> Throws: IllegalStateException 
> Thrown if the current thread is not associated with a transaction. 
> 
> Throws: SystemException 
> Thrown if the transaction manager encounters an unexpected error condition. 
> 
> 
> Regards
> Takayuki Tsunakawa

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-08 Thread Masahiko Sawada

On Tue, Jun 8, 2021 at 9:47 AM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > I think we should not reinterpret the severity of the error and lower
> > it. Especially, in this case, any kind of errors can be thrown. It
> > could be such a serious error that FDW developer wants to report to
> > the client. Do we lower even PANIC to a lower severity such as
> > WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
> > lowering ERROR (and FATAL) to WARNING, why do we regard only them as
> > non-error?
>
> Why does the client have to know the error on a remote server, whereas the 
> global transaction itself is destined to commit?

It's not necessarily on a remote server. It could be a problem with
the local server.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-07 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> I think we should not reinterpret the severity of the error and lower
> it. Especially, in this case, any kind of errors can be thrown. It
> could be such a serious error that FDW developer wants to report to
> the client. Do we lower even PANIC to a lower severity such as
> WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
> lowering ERROR (and FATAL) to WARNING, why do we regard only them as
> non-error?

Why does the client have to know the error on a remote server, whereas the 
global transaction itself is destined to commit?

FYI, the tx_commit() in the X/Open TX interface and the 
UserTransaction.commit() in JTA don't return such an error, IIRC.  Do TX_FAIL 
and SystemException serve such a purpose?  I don't feel like that.


[Tuxedo manual (Japanese)]
https://docs.oracle.com/cd/F25597_01/document/products/tuxedo/tux80j/atmi/rf3c91.htm


[JTA]
public interface javax.transaction.UserTransaction 
public void commit()
 throws RollbackException, HeuristicMixedException, 
HeuristicRollbackException, SecurityException, 
IllegalStateException, SystemException 

Throws: RollbackException 
Thrown to indicate that the transaction has been rolled back rather than 
committed. 

Throws: HeuristicMixedException 
Thrown to indicate that a heuristic decision was made and that some relevant 
updates have been 
committed while others have been rolled back. 

Throws: HeuristicRollbackException 
Thrown to indicate that a heuristic decision was made and that all relevant 
updates have been rolled 
back. 

Throws: SecurityException 
Thrown to indicate that the thread is not allowed to commit the transaction. 

Throws: IllegalStateException 
Thrown if the current thread is not associated with a transaction. 

Throws: SystemException 
Thrown if the transaction manager encounters an unexpected error condition. 


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com




> 2021/06/04 21:38、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada  wrote:
>> 
>> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>>  wrote:
>>> 
>>> 
>>> 
>>> 2021/06/04 12:28、Masahiko Sawada のメール:
>>> 
>>> 
>>> Thank you for pointing it out. This idea has been proposed several
>>> times and there were discussions. I'd like to summarize the proposed
>>> ideas and those pros and cons before replying to your other comments.
>>> 
>>> There are 3 ideas. After backend both prepares all foreign transaction
>>> and commit the local transaction,
>>> 
>>> 1. the backend continues attempting to commit all prepared foreign
>>> transactions until all of them are committed.
>>> 2. the backend attempts to commit all prepared foreign transactions
>>> once. If an error happens, leave them for the resolver.
>>> 3. the backend asks the resolver that launched per foreign server to
>>> commit the prepared foreign transactions (and backend waits or doesn't
>>> wait for the commit completion depending on the setting).
>>> 
>>> With ideas 1 and 2, since the backend itself commits all foreign
>>> transactions the resolver process cannot be a bottleneck, and probably
>>> the code can get more simple as backends don't need to communicate
>>> with resolver processes.
>>> 
>>> However, those have two problems we need to deal with:
>>> 
>>> 
>>> Thanks for sharing the summarize. I understood there are problems related to
>>> FDW implementation.
>>> 
>>> First, users could get an error if an error happens during the backend
>>> committing prepared foreign transaction but the local transaction is
>>> already committed and some foreign transactions could also be
>>> committed, confusing users. There were two opinions to this problem:
>>> FDW developers should be responsible for writing FDW code such that
>>> any error doesn't happen during committing foreign transactions, and
>>> users can accept that confusion since an error could happen after
>>> writing the commit WAL even today without this 2PC feature. For the
>>> former point, I'm not sure it's always doable since even palloc()
>>> could raise an error and it seems hard to require all FDW developers
>>> to understand all possible paths of raising an error. And for the
>>> latter point, that's true but I think those cases are
>>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>>> an error during committing prepared transactions is not low (e.g., by
>>> network connectivity problem). I think we need to assume that that is
>>> not a rare case.
>>> 
>>> 
>>> Hmm… Sorry, I don’t have any good ideas now.
>>> 
>>> If anything, I’m on second side which users accept the confusion though
>>> let users know a error happens before local commit is done or not is 
>>> necessary
>>> because if the former case, users will execute the same query again.
>> 
>> Yeah, users will need to remember the XID of the last executed
>> transaction and check if it has been committed by pg_xact_status().
> 
> As the second idea, can we send something like a hint along with the
> error (or send a new type of error) that indicates the error happened
> after the transaction commit so that the client can decide whether or
> not to ignore the error? That way, we can deal with the confusion led
> by an error raised after the local commit by the existing post-commit
> cleanup routines (and post-commit xact callbacks) as well as by FDW’s
> commit prepared routine.


I think your second idea is better because it’s easier for users to know what 
error happens and there is nothing users should do. Since the focus of "hint” 
is how to fix the problem, is it appropriate to use "context”?  

FWIF, I took a fast look to elog.c and I found there is “error_context_stack”. 
So, why don’t you add the context which shows like "the transaction fate is 
decided to COMMIT (or ROLLBACK). So, even if error happens, the transaction 
will be resolved in background” after the local commit?


Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com




> 2021/06/04 17:16、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>  wrote:
>> 
>> 
>> 
>> 2021/06/04 12:28、Masahiko Sawada のメール:
>> 
>> 
>> Thank you for pointing it out. This idea has been proposed several
>> times and there were discussions. I'd like to summarize the proposed
>> ideas and those pros and cons before replying to your other comments.
>> 
>> There are 3 ideas. After backend both prepares all foreign transaction
>> and commit the local transaction,
>> 
>> 1. the backend continues attempting to commit all prepared foreign
>> transactions until all of them are committed.
>> 2. the backend attempts to commit all prepared foreign transactions
>> once. If an error happens, leave them for the resolver.
>> 3. the backend asks the resolver that launched per foreign server to
>> commit the prepared foreign transactions (and backend waits or doesn't
>> wait for the commit completion depending on the setting).
>> 
>> With ideas 1 and 2, since the backend itself commits all foreign
>> transactions the resolver process cannot be a bottleneck, and probably
>> the code can get more simple as backends don't need to communicate
>> with resolver processes.
>> 
>> However, those have two problems we need to deal with:
>> 
>> 
>> Thanks for sharing the summarize. I understood there are problems related to
>> FDW implementation.
>> 
>> First, users could get an error if an error happens during the backend
>> committing prepared foreign transaction but the local transaction is
>> already committed and some foreign transactions could also be
>> committed, confusing users. There were two opinions to this problem:
>> FDW developers should be responsible for writing FDW code such that
>> any error doesn't happen during committing foreign transactions, and
>> users can accept that confusion since an error could happen after
>> writing the commit WAL even today without this 2PC feature. For the
>> former point, I'm not sure it's always doable since even palloc()
>> could raise an error and it seems hard to require all FDW developers
>> to understand all possible paths of raising an error. And for the
>> latter point, that's true but I think those cases are
>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>> an error during committing prepared transactions is not low (e.g., by
>> network connectivity problem). I think we need to assume that that is
>> not a rare case.
>> 
>> 
>> Hmm… Sorry, I don’t have any good ideas now.
>> 
>> If anything, I’m on second side which users accept the confusion though
>> let users know a error happens before local commit is done or not is 
>> necessary
>> because if the former case, users will execute the same query again.
> 
> Yeah, users will need to remember the XID of the last executed
> transaction and check if it has been committed by pg_xact_status().
> 
>> 
>> 
>> The second problem is whether we can cancel committing foreign
>> transactions by pg_cancel_backend() (or pressing Ctl-c). If the
>> backend process commits prepared foreign transactions, it's FDW
>> developers' responsibility to write code that is interruptible. I’m
>> not sure it’s feasible for drivers for other databases.
>> 
>> 
>> Sorry, my understanding is not clear.
>> 
>> After all prepares are done, the foreign transactions will be committed.
>> So, does this mean that FDW must leave the unresolved transaction to the 
>> transaction
>> resolver and show some messages like “Since the transaction is already 
>> committed,
>> the transaction will be resolved in background" ?
> 
> I think this would happen after the backend cancels COMMIT PREPARED.
> To be able to cancel an in-progress query the backend needs to accept
> the interruption and send the cancel request. postgres_fdw can do that
> since libpq supports sending a query and waiting for the result but
> I’m not sure about other drivers.

Thanks, I understood that handling this issue is not scope of the 2PC feature 
as Tsunakawa-san and you said, 

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread Masahiko Sawada

On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada  wrote:
>
> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>  wrote:
> >
> >
> >
> > 2021/06/04 12:28、Masahiko Sawada のメール:
> >
> >
> > Thank you for pointing it out. This idea has been proposed several
> > times and there were discussions. I'd like to summarize the proposed
> > ideas and those pros and cons before replying to your other comments.
> >
> > There are 3 ideas. After backend both prepares all foreign transaction
> > and commit the local transaction,
> >
> > 1. the backend continues attempting to commit all prepared foreign
> > transactions until all of them are committed.
> > 2. the backend attempts to commit all prepared foreign transactions
> > once. If an error happens, leave them for the resolver.
> > 3. the backend asks the resolver that launched per foreign server to
> > commit the prepared foreign transactions (and backend waits or doesn't
> > wait for the commit completion depending on the setting).
> >
> > With ideas 1 and 2, since the backend itself commits all foreign
> > transactions the resolver process cannot be a bottleneck, and probably
> > the code can get more simple as backends don't need to communicate
> > with resolver processes.
> >
> > However, those have two problems we need to deal with:
> >
> >
> > Thanks for sharing the summarize. I understood there are problems related to
> > FDW implementation.
> >
> > First, users could get an error if an error happens during the backend
> > committing prepared foreign transaction but the local transaction is
> > already committed and some foreign transactions could also be
> > committed, confusing users. There were two opinions to this problem:
> > FDW developers should be responsible for writing FDW code such that
> > any error doesn't happen during committing foreign transactions, and
> > users can accept that confusion since an error could happen after
> > writing the commit WAL even today without this 2PC feature. For the
> > former point, I'm not sure it's always doable since even palloc()
> > could raise an error and it seems hard to require all FDW developers
> > to understand all possible paths of raising an error. And for the
> > latter point, that's true but I think those cases are
> > should-not-happen cases (i.g., rare cases) whereas the likelihood of
> > an error during committing prepared transactions is not low (e.g., by
> > network connectivity problem). I think we need to assume that that is
> > not a rare case.
> >
> >
> > Hmm… Sorry, I don’t have any good ideas now.
> >
> > If anything, I’m on second side which users accept the confusion though
> > let users know a error happens before local commit is done or not is 
> > necessary
> > because if the former case, users will execute the same query again.
>
> Yeah, users will need to remember the XID of the last executed
> transaction and check if it has been committed by pg_xact_status().

As the second idea, can we send something like a hint along with the
error (or send a new type of error) that indicates the error happened
after the transaction commit so that the client can decide whether or
not to ignore the error? That way, we can deal with the confusion led
by an error raised after the local commit by the existing post-commit
cleanup routines (and post-commit xact callbacks) as well as by FDW’s
commit prepared routine.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread Masahiko Sawada

On Fri, Jun 4, 2021 at 5:59 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.ta...@fujitsu.com
> >  wrote:
> > > Why does the user have to get an error?  Once the local transaction has 
> > > been
> > prepared, which means all remote ones also have been prepared, the whole
> > transaction is determined to commit.  So, the user doesn't have to receive 
> > an
> > error as long as the local node is alive.
> >
> > I think we should neither ignore the error thrown by FDW code nor
> > lower the error level (e.g., ERROR to WARNING).
>
> Why?  (Forgive me for asking relentlessly... by imagining me as a cute 
> 7-year-old boy/girl asking "Why Dad?")

I think we should not reinterpret the severity of the error and lower
it. Especially, in this case, any kind of errors can be thrown. It
could be such a serious error that FDW developer wants to report to
the client. Do we lower even PANIC to a lower severity such as
WARNING? That's definitely a bad idea. If we don’t lower PANIC whereas
lowering ERROR (and FATAL) to WARNING, why do we regard only them as
non-error?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.ta...@fujitsu.com
>  wrote:
> > Why does the user have to get an error?  Once the local transaction has been
> prepared, which means all remote ones also have been prepared, the whole
> transaction is determined to commit.  So, the user doesn't have to receive an
> error as long as the local node is alive.
> 
> I think we should neither ignore the error thrown by FDW code nor
> lower the error level (e.g., ERROR to WARNING).

Why?  (Forgive me for asking relentlessly... by imagining me as a cute 
7-year-old boy/girl asking "Why Dad?")


> > How do non-2PC and 2PC cases differ in the rarity of the error?
> 
> I think the main difference would be that in 2PC case there will be
> network communications possibly with multiple servers after the local
> commit.

Then, it's the same failure mode.  That is, the same failure could occur for 
both cases.  That doesn't require us to differentiate between them.  Let's 
ignore this point from now on.


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread Masahiko Sawada

On Fri, Jun 4, 2021 at 5:04 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> 1. the backend continues attempting to commit all prepared foreign
> > transactions until all of them are committed.
> > 2. the backend attempts to commit all prepared foreign transactions
> > once. If an error happens, leave them for the resolver.
> > 3. the backend asks the resolver that launched per foreign server to
> > commit the prepared foreign transactions (and backend waits or doesn't
> > wait for the commit completion depending on the setting).
> >
> > With ideas 1 and 2, since the backend itself commits all foreign
> > transactions the resolver process cannot be a bottleneck, and probably
> > the code can get more simple as backends don't need to communicate
> > with resolver processes.
> >
> > However, those have two problems we need to deal with:
> >
>
> > First, users could get an error if an error happens during the backend
> > committing prepared foreign transaction but the local transaction is
> > already committed and some foreign transactions could also be
> > committed, confusing users. There were two opinions to this problem:
> > FDW developers should be responsible for writing FDW code such that
> > any error doesn't happen during committing foreign transactions, and
> > users can accept that confusion since an error could happen after
> > writing the commit WAL even today without this 2PC feature.
>
> Why does the user have to get an error?  Once the local transaction has been 
> prepared, which means all remote ones also have been prepared, the whole 
> transaction is determined to commit.  So, the user doesn't have to receive an 
> error as long as the local node is alive.

I think we should neither ignore the error thrown by FDW code nor
lower the error level (e.g., ERROR to WARNING).

>
> > And for the
> > latter point, that's true but I think those cases are
> > should-not-happen cases (i.g., rare cases) whereas the likelihood of
> > an error during committing prepared transactions is not low (e.g., by
> > network connectivity problem). I think we need to assume that that is
> > not a rare case.
>
> How do non-2PC and 2PC cases differ in the rarity of the error?

I think the main difference would be that in 2PC case there will be
network communications possibly with multiple servers after the local
commit.

>
>
> > The second problem is whether we can cancel committing foreign
> > transactions by pg_cancel_backend() (or pressing Ctl-c). If the
> > backend process commits prepared foreign transactions, it's FDW
> > developers' responsibility to write code that is interruptible. I’m
> > not sure it’s feasible for drivers for other databases.
>
> That's true not only for prepare and commit but also for other queries.  Why 
> do we have to treat prepare and commit specially?

Good point. This would not be a blocker for ideas 1 and 2 but is a
side benefit of idea 3.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread Masahiko Sawada

On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
 wrote:
>
>
>
> 2021/06/04 12:28、Masahiko Sawada のメール:
>
>
> Thank you for pointing it out. This idea has been proposed several
> times and there were discussions. I'd like to summarize the proposed
> ideas and those pros and cons before replying to your other comments.
>
> There are 3 ideas. After backend both prepares all foreign transaction
> and commit the local transaction,
>
> 1. the backend continues attempting to commit all prepared foreign
> transactions until all of them are committed.
> 2. the backend attempts to commit all prepared foreign transactions
> once. If an error happens, leave them for the resolver.
> 3. the backend asks the resolver that launched per foreign server to
> commit the prepared foreign transactions (and backend waits or doesn't
> wait for the commit completion depending on the setting).
>
> With ideas 1 and 2, since the backend itself commits all foreign
> transactions the resolver process cannot be a bottleneck, and probably
> the code can get more simple as backends don't need to communicate
> with resolver processes.
>
> However, those have two problems we need to deal with:
>
>
> Thanks for sharing the summarize. I understood there are problems related to
> FDW implementation.
>
> First, users could get an error if an error happens during the backend
> committing prepared foreign transaction but the local transaction is
> already committed and some foreign transactions could also be
> committed, confusing users. There were two opinions to this problem:
> FDW developers should be responsible for writing FDW code such that
> any error doesn't happen during committing foreign transactions, and
> users can accept that confusion since an error could happen after
> writing the commit WAL even today without this 2PC feature. For the
> former point, I'm not sure it's always doable since even palloc()
> could raise an error and it seems hard to require all FDW developers
> to understand all possible paths of raising an error. And for the
> latter point, that's true but I think those cases are
> should-not-happen cases (i.g., rare cases) whereas the likelihood of
> an error during committing prepared transactions is not low (e.g., by
> network connectivity problem). I think we need to assume that that is
> not a rare case.
>
>
> Hmm… Sorry, I don’t have any good ideas now.
>
> If anything, I’m on second side which users accept the confusion though
> let users know a error happens before local commit is done or not is necessary
> because if the former case, users will execute the same query again.

Yeah, users will need to remember the XID of the last executed
transaction and check if it has been committed by pg_xact_status().

>
>
> The second problem is whether we can cancel committing foreign
> transactions by pg_cancel_backend() (or pressing Ctl-c). If the
> backend process commits prepared foreign transactions, it's FDW
> developers' responsibility to write code that is interruptible. I’m
> not sure it’s feasible for drivers for other databases.
>
>
> Sorry, my understanding is not clear.
>
> After all prepares are done, the foreign transactions will be committed.
> So, does this mean that FDW must leave the unresolved transaction to the 
> transaction
> resolver and show some messages like “Since the transaction is already 
> committed,
> the transaction will be resolved in background" ?

I think this would happen after the backend cancels COMMIT PREPARED.
To be able to cancel an in-progress query the backend needs to accept
the interruption and send the cancel request. postgres_fdw can do that
since libpq supports sending a query and waiting for the result but
I’m not sure about other drivers.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
1. the backend continues attempting to commit all prepared foreign
> transactions until all of them are committed.
> 2. the backend attempts to commit all prepared foreign transactions
> once. If an error happens, leave them for the resolver.
> 3. the backend asks the resolver that launched per foreign server to
> commit the prepared foreign transactions (and backend waits or doesn't
> wait for the commit completion depending on the setting).
> 
> With ideas 1 and 2, since the backend itself commits all foreign
> transactions the resolver process cannot be a bottleneck, and probably
> the code can get more simple as backends don't need to communicate
> with resolver processes.
> 
> However, those have two problems we need to deal with:
> 

> First, users could get an error if an error happens during the backend
> committing prepared foreign transaction but the local transaction is
> already committed and some foreign transactions could also be
> committed, confusing users. There were two opinions to this problem:
> FDW developers should be responsible for writing FDW code such that
> any error doesn't happen during committing foreign transactions, and
> users can accept that confusion since an error could happen after
> writing the commit WAL even today without this 2PC feature. 

Why does the user have to get an error?  Once the local transaction has been 
prepared, which means all remote ones also have been prepared, the whole 
transaction is determined to commit.  So, the user doesn't have to receive an 
error as long as the local node is alive.


> For the
> former point, I'm not sure it's always doable since even palloc()
> could raise an error and it seems hard to require all FDW developers
> to understand all possible paths of raising an error.

No, this is a matter of discipline to ensure consistency, just in case we 
really have to return an error to the user.


> And for the
> latter point, that's true but I think those cases are
> should-not-happen cases (i.g., rare cases) whereas the likelihood of
> an error during committing prepared transactions is not low (e.g., by
> network connectivity problem). I think we need to assume that that is
> not a rare case.

How do non-2PC and 2PC cases differ in the rarity of the error?


> The second problem is whether we can cancel committing foreign
> transactions by pg_cancel_backend() (or pressing Ctl-c). If the
> backend process commits prepared foreign transactions, it's FDW
> developers' responsibility to write code that is interruptible. I’m
> not sure it’s feasible for drivers for other databases.

That's true not only for prepare and commit but also for other queries.  Why do 
we have to treat prepare and commit specially?


> Through the long discussion on this thread, I've been thought we got a
> consensus on idea 3 but sometimes ideas 1 and 2 are proposed again for

I don't remember seeing any consensus yet?

> With the current patch, we commit prepared foreign transactions
> asynchronously. But maybe we need to compare the performance of ideas
> 1 (and 2) to idea 3 with synchronous foreign transaction resolution.

+1


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-04 Thread ikeda...@oss.nttdata.com



> 2021/06/04 12:28、Masahiko Sawada のメール:
> 
> On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda  > wrote:
>> 
>> 
>> 
>> On 2021/05/25 21:59, Masahiko Sawada wrote:
>>> On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  
>>> wrote:
 
 On 2021/05/21 13:45, Masahiko Sawada wrote:
> 
> Yes. We also might need to be careful about the order of foreign
> transaction resolution. I think we need to resolve foreign> transactions 
> in arrival order at least within a foreign server.
 
 I agree it's better.
 
 (Although this is my interest...)
 Is it necessary? Although this idea seems to be for atomic visibility,
 2PC can't realize that as you know. So, I wondered that.
>>> 
>>> I think it's for fairness. If a foreign transaction arrived earlier
>>> gets put off so often for other foreign transactions arrived later due
>>> to its index in FdwXactCtl->xacts, it’s not understandable for users
>>> and not fair. I think it’s better to handle foreign transactions in
>>> FIFO manner (although this problem exists even in the current code).
>> 
>> OK, thanks.
>> 
>> 
>> On 2021/05/21 12:45, Masahiro Ikeda wrote:
>>> On 2021/05/21 10:39, Masahiko Sawada wrote:
 On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda 
>> wrote:
 How much is the performance without those 2PC patches and with the
 same workload? i.e., how fast is the current postgres_fdw that uses
 XactCallback?
>>> 
>>> OK, I'll test.
>> 
>> The test results are followings. But, I couldn't confirm the performance
>> improvements of 2PC patches though I may need to be changed the test 
>> condition.
>> 
>> [condition]
>> * 1 coordinator and 3 foreign servers
>> * There are two custom scripts which access different two foreign servers per
>> transaction
>> 
>> ``` fxact_select.pgbench
>> BEGIN;
>> SELECT * FROM part:p1 WHERE id = :id;
>> SELECT * FROM part:p2 WHERE id = :id;
>> COMMIT;
>> ```
>> 
>> ``` fxact_update.pgbench
>> BEGIN;
>> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
>> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
>> COMMIT;
>> ```
>> 
>> [results]
>> 
>> I have tested three times.
>> Performance difference seems to be within the range of errors.
>> 
>> # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
>> - fxact_update.pgbench
>> 72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
>> 110.5, 106.8, 103.2  ms => avg 106.8 ms
>> 
>> - fxact_select.pgbench
>> 1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
>> 4.5, 4.6, 4.7 ms => avg 4.6ms
>> 
>> # 6d0eb38557 without 2pc patches
>> - fxact_update.pgbench
>> 76.5, 70.6, 69.5 TPS => avg 72.2 TPS
>> 104.534 + 113.244 + 115.097 => avg 111.0 ms
>> 
>> -fxact_select.pgbench
>> 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
>> 4.2, 4.6, 4.6 ms=>  4.5 ms
>> 
> 
> Thank you for testing!
> 
> I think the result shows that managing foreign transactions on the
> core side would not be a problem in terms of performance.
> 
>> 
>> 
>> 
>> 
>> # About the bottleneck of the resolver process
>> 
>> I investigated the performance bottleneck of the resolver process using perf.
>> The main bottleneck is the following functions.
>> 
>> 1st. 42.8% routine->CommitForeignTransaction()
>> 2nd. 31.5% remove_fdwxact()
>> 3rd. 10.16% CommitTransaction()
>> 
>> 1st and 3rd problems can be solved by parallelizing resolver processes per
>> remote servers. But, I wondered that the idea, which backends call also
>> "COMMIT/ABORT PREPARED" and the resolver process only takes changes of
>> resolving in-doubt foreign transactions, is better. In many cases, I think
>> that the number of connections is much greater than the number of remote
>> servers. If so, the parallelization is not enough.
>> 
>> So, I think the idea which backends execute "PREPARED COMMIT" synchronously 
>> is
>> better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
>> the extension. So, this idea is not bad.
> 
> Thank you for pointing it out. This idea has been proposed several
> times and there were discussions. I'd like to summarize the proposed
> ideas and those pros and cons before replying to your other comments.
> 
> There are 3 ideas. After backend both prepares all foreign transaction
> and commit the local transaction,
> 
> 1. the backend continues attempting to commit all prepared foreign
> transactions until all of them are committed.
> 2. the backend attempts to commit all prepared foreign transactions
> once. If an error happens, leave them for the resolver.
> 3. the backend asks the resolver that launched per foreign server to
> commit the prepared foreign transactions (and backend waits or doesn't
> wait for the commit completion depending on the setting).
> 
> With ideas 1 and 2, since the backend itself commits all foreign
> transactions the resolver process cannot be a bottleneck, and probably
> the code can get more simple as backends don't need to communicate
> with resolver processes.

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-03 Thread Masahiko Sawada

On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda  wrote:
>
>
>
> On 2021/05/25 21:59, Masahiko Sawada wrote:
> > On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  
> > wrote:
> >>
> >> On 2021/05/21 13:45, Masahiko Sawada wrote:
> >>>
> >>> Yes. We also might need to be careful about the order of foreign
> >>> transaction resolution. I think we need to resolve foreign> transactions 
> >>> in arrival order at least within a foreign server.
> >>
> >> I agree it's better.
> >>
> >> (Although this is my interest...)
> >> Is it necessary? Although this idea seems to be for atomic visibility,
> >> 2PC can't realize that as you know. So, I wondered that.
> >
> > I think it's for fairness. If a foreign transaction arrived earlier
> > gets put off so often for other foreign transactions arrived later due
> > to its index in FdwXactCtl->xacts, it’s not understandable for users
> > and not fair. I think it’s better to handle foreign transactions in
> > FIFO manner (although this problem exists even in the current code).
>
> OK, thanks.
>
>
> On 2021/05/21 12:45, Masahiro Ikeda wrote:
> > On 2021/05/21 10:39, Masahiko Sawada wrote:
> >> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda 
> wrote:
> >> How much is the performance without those 2PC patches and with the
> >> same workload? i.e., how fast is the current postgres_fdw that uses
> >> XactCallback?
> >
> > OK, I'll test.
>
> The test results are followings. But, I couldn't confirm the performance
> improvements of 2PC patches though I may need to be changed the test 
> condition.
>
> [condition]
> * 1 coordinator and 3 foreign servers
> * There are two custom scripts which access different two foreign servers per
> transaction
>
> ``` fxact_select.pgbench
> BEGIN;
> SELECT * FROM part:p1 WHERE id = :id;
> SELECT * FROM part:p2 WHERE id = :id;
> COMMIT;
> ```
>
> ``` fxact_update.pgbench
> BEGIN;
> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
> COMMIT;
> ```
>
> [results]
>
> I have tested three times.
> Performance difference seems to be within the range of errors.
>
> # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
> - fxact_update.pgbench
> 72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
> 110.5, 106.8, 103.2  ms => avg 106.8 ms
>
> - fxact_select.pgbench
> 1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
> 4.5, 4.6, 4.7 ms => avg 4.6ms
>
> # 6d0eb38557 without 2pc patches
> - fxact_update.pgbench
> 76.5, 70.6, 69.5 TPS => avg 72.2 TPS
> 104.534 + 113.244 + 115.097 => avg 111.0 ms
>
> -fxact_select.pgbench
> 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
> 4.2, 4.6, 4.6 ms=>  4.5 ms
>

Thank you for testing!

I think the result shows that managing foreign transactions on the
core side would not be a problem in terms of performance.

>
>
>
>
> # About the bottleneck of the resolver process
>
> I investigated the performance bottleneck of the resolver process using perf.
> The main bottleneck is the following functions.
>
> 1st. 42.8% routine->CommitForeignTransaction()
> 2nd. 31.5% remove_fdwxact()
> 3rd. 10.16% CommitTransaction()
>
> 1st and 3rd problems can be solved by parallelizing resolver processes per
> remote servers. But, I wondered that the idea, which backends call also
> "COMMIT/ABORT PREPARED" and the resolver process only takes changes of
> resolving in-doubt foreign transactions, is better. In many cases, I think
> that the number of connections is much greater than the number of remote
> servers. If so, the parallelization is not enough.
>
> So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
> better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
> the extension. So, this idea is not bad.

Thank you for pointing it out. This idea has been proposed several
times and there were discussions. I'd like to summarize the proposed
ideas and those pros and cons before replying to your other comments.

There are 3 ideas. After backend both prepares all foreign transaction
and commit the local transaction,

1. the backend continues attempting to commit all prepared foreign
transactions until all of them are committed.
2. the backend attempts to commit all prepared foreign transactions
once. If an error happens, leave them for the resolver.
3. the backend asks the resolver that launched per foreign server to
commit the prepared foreign transactions (and backend waits or doesn't
wait for the commit completion depending on the setting).

With ideas 1 and 2, since the backend itself commits all foreign
transactions the resolver process cannot be a bottleneck, and probably
the code can get more simple as backends don't need to communicate
with resolver processes.

However, those have two problems we need to deal with:

First, users could get an error if an error happens during the backend
committing prepared foreign transaction but the local transaction is
already committed and some foreign transactions could

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-02 Thread Masahiro Ikeda

On 2021/05/25 21:59, Masahiko Sawada wrote:
> On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  
> wrote:
>>
>> On 2021/05/21 13:45, Masahiko Sawada wrote:
>>>
>>> Yes. We also might need to be careful about the order of foreign
>>> transaction resolution. I think we need to resolve foreign> transactions in 
>>> arrival order at least within a foreign server.
>>
>> I agree it's better.
>>
>> (Although this is my interest...)
>> Is it necessary? Although this idea seems to be for atomic visibility,
>> 2PC can't realize that as you know. So, I wondered that.
> 
> I think it's for fairness. If a foreign transaction arrived earlier
> gets put off so often for other foreign transactions arrived later due
> to its index in FdwXactCtl->xacts, it’s not understandable for users
> and not fair. I think it’s better to handle foreign transactions in
> FIFO manner (although this problem exists even in the current code).

OK, thanks.

On 2021/05/21 12:45, Masahiro Ikeda wrote:
> On 2021/05/21 10:39, Masahiko Sawada wrote:
>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda 
wrote:
>> How much is the performance without those 2PC patches and with the
>> same workload? i.e., how fast is the current postgres_fdw that uses
>> XactCallback?
>
> OK, I'll test.

The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.

[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction

``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```

``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

[results]

I have tested three times.
Performance difference seems to be within the range of errors.

# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
110.5, 106.8, 103.2  ms => avg 106.8 ms

- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms

# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms

-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=>  4.5 ms

# About the bottleneck of the resolver process

I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.

1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()

1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.

So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.

Although resolving asynchronously has the performance benefit, we can't take
advantage because the resolver process can be bottleneck easily now.

2nd remove_fdwxact() syncs the WAL, which indicates the foreign transaction
entry is removed. Is it necessary to sync momentarily?

To remove syncing leads the time of recovery phase may be longer because some
fdxact entries need to "COMMIT/ABORT PREPARED" again. But I think the effect
is limited.

# About other trivial comments.

* Is it better to call pgstat_send_wal() in the resolver process?

* Is it better to specify that only one resolver process can be launched in on
database on the descrpition of "max_foreign_transaction_resolvers"?

* Is it intentional that removing and inserting new lines in foreigncmds.c?

* Is it better that "max_prepared_foreign_transactions=%d" is after
"max_prepared_xacts=%d" in xlogdesc.c?

* Is "fdwxact_queue" unnecessary now?

* Is the following " + sizeof(FdwXactResolver)" unnecessary?

#define SizeOfFdwXactResolverCtlData \
(offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))

Although MultiXactStateData considered the backendIds start from 1 indexed,
the resolvers start from 0 indexed. Sorry, if my understanding is wrong.

* s/transaciton/transaction/

* s/foreign_xact_resolution_retry_interval since last
resolver/foreign_xact_resolution_retry_interval since last resolver was/

* Don't we need the debug log in the following in postgres.c like logical
launcher shutdown?

else if (IsFdwXactLauncher())
{
/*
* The foreign transaction

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-25 Thread Masahiko Sawada

On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  wrote:
>
>
>
> On 2021/05/21 13:45, Masahiko Sawada wrote:
> >
> > Yes. We also might need to be careful about the order of foreign
> > transaction resolution. I think we need to resolve foreign> transactions in 
> > arrival order at least within a foreign server.
>
> I agree it's better.
>
> (Although this is my interest...)
> Is it necessary? Although this idea seems to be for atomic visibility,
> 2PC can't realize that as you know. So, I wondered that.

I think it's for fairness. If a foreign transaction arrived earlier
gets put off so often for other foreign transactions arrived later due
to its index in FdwXactCtl->xacts, it’s not understandable for users
and not fair. I think it’s better to handle foreign transactions in
FIFO manner (although this problem exists even in the current code).

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-21 Thread Masahiro Ikeda

On 2021/05/21 13:45, Masahiko Sawada wrote:
> On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
>  wrote:
>>
>>
>>
>> On 2021/05/21 10:39, Masahiko Sawada wrote:
>>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  
>>> wrote:

 On 2021/05/11 13:37, Masahiko Sawada wrote:
> I've attached the updated patches that incorporated comments from
> Zhihong and Ikeda-san.

 Thanks for updating the patches!

 I have other comments including trivial things.

 a. about "foreign_transaction_resolver_timeout" parameter

 Now, the default value of "foreign_transaction_resolver_timeout" is 60 
 secs.
 Is there any reason? Although the following is minor case, it may confuse 
 some
 users.

 Example case is that

 1. a client executes transaction with 2PC when the resolver is processing
 FdwXactResolverProcessInDoubtXacts().

 2. the resolution of 1st transaction must be waited until the other
 transactions for 2pc are executed or timeout.

 3. if the client check the 1st result value, it should wait until 
 resolution
 is finished for atomic visibility (although it depends on the way how to
 realize atomic visibility.) The clients may be waited
 foreign_transaction_resolver_timeout". Users may think it's stale.

 Like this situation can be observed after testing with pgbench. Some
 unresolved transaction remains after benchmarking.

 I assume that this default value refers to wal_sender, archiver, and so on.
 But, I think this parameter is more like "commit_delay". If so, 60 seconds
 seems to be big value.
>>>
>>> IIUC this situation seems like the foreign transaction resolution is
>>> bottle-neck and doesn’t catch up to incoming resolution requests. But
>>> how foreignt_transaction_resolver_timeout relates to this situation?
>>> foreign_transaction_resolver_timeout controls when to terminate the
>>> resolver process that doesn't have any foreign transactions to
>>> resolve. So if we set it several milliseconds, resolver processes are
>>> terminated immediately after each resolution, imposing the cost of
>>> launching resolver processes on the next resolution.
>>
>> Thanks for your comments!
>>
>> No, this situation is not related to the foreign transaction resolution is
>> bottle-neck or not. This issue may happen when the workload has very few
>> foreign transactions.
>>
>> If new foreign transaction comes while the transaction resolver is processing
>> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
>> waits until starting next transaction resolution. If next foreign transaction
>> doesn't come, the foreign transaction must wait starting resolution until
>> timeout. I mentioned this situation.
> 
> Thanks for your explanation. I think that in this case we should set
> the latch of the resolver after preparing all foreign transactions so
> that the resolver process those transactions without sleep.

Yes, your idea is much better. Thanks!

>>
>> Thanks for letting me know the side effect if setting resolution timeout to
>> several milliseconds. I agree. But, why termination is needed? Is there a
>> possibility to stale like walsender?
> 
> The purpose of this timeout is to terminate resolvers that are idle
> for a long time. The resolver processes don't necessarily need to keep
> running all the time for every database. On the other hand, launching
> a resolver process per commit would be a high cost. So we have
> resolver processes keep running at least for
> foreign_transaction_resolver_timeout.
Understood. I think it's reasonable.

 b. about performance bottleneck (just share my simple benchmark results)

 The resolver process can be performance bottleneck easily although I think
 some users want this feature even if the performance is not so good.

 I tested with very simple workload in my laptop.

 The test condition is
 * two remote foreign partitions and one transaction inserts an entry in 
 each
 partitions.
 * local connection only. If NW latency became higher, the performance 
 became
 worse.
 * pgbench with 8 clients.

 The test results is the following. The performance of 2PC is only 10%
 performance of the one of without 2PC.

 * with foreign_twophase_commit = requried
 -> If load with more than 10TPS, the number of unresolved foreign 
 transactions
 is increasing and stop with the warning "Increase
 max_prepared_foreign_transactions".
>>>
>>> What was the value of max_prepared_foreign_transactions?
>>
>> Now, I tested with 200.
>>
>> If each resolution is finished very soon, I thought it's enough because
>> 8clients x 2partitions = 16, though... But, it's difficult how to know the
>> stable values.
> 
> During resolving one distributed transaction, the resolver needs both
> one round trip and fsync-ing WAL

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiko Sawada

On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
 wrote:
>
>
>
> On 2021/05/21 10:39, Masahiko Sawada wrote:
> > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  
> > wrote:
> >>
> >>
> >> On 2021/05/11 13:37, Masahiko Sawada wrote:
> >>> I've attached the updated patches that incorporated comments from
> >>> Zhihong and Ikeda-san.
> >>
> >> Thanks for updating the patches!
> >>
> >>
> >> I have other comments including trivial things.
> >>
> >>
> >> a. about "foreign_transaction_resolver_timeout" parameter
> >>
> >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 
> >> secs.
> >> Is there any reason? Although the following is minor case, it may confuse 
> >> some
> >> users.
> >>
> >> Example case is that
> >>
> >> 1. a client executes transaction with 2PC when the resolver is processing
> >> FdwXactResolverProcessInDoubtXacts().
> >>
> >> 2. the resolution of 1st transaction must be waited until the other
> >> transactions for 2pc are executed or timeout.
> >>
> >> 3. if the client check the 1st result value, it should wait until 
> >> resolution
> >> is finished for atomic visibility (although it depends on the way how to
> >> realize atomic visibility.) The clients may be waited
> >> foreign_transaction_resolver_timeout". Users may think it's stale.
> >>
> >> Like this situation can be observed after testing with pgbench. Some
> >> unresolved transaction remains after benchmarking.
> >>
> >> I assume that this default value refers to wal_sender, archiver, and so on.
> >> But, I think this parameter is more like "commit_delay". If so, 60 seconds
> >> seems to be big value.
> >
> > IIUC this situation seems like the foreign transaction resolution is
> > bottle-neck and doesn’t catch up to incoming resolution requests. But
> > how foreignt_transaction_resolver_timeout relates to this situation?
> > foreign_transaction_resolver_timeout controls when to terminate the
> > resolver process that doesn't have any foreign transactions to
> > resolve. So if we set it several milliseconds, resolver processes are
> > terminated immediately after each resolution, imposing the cost of
> > launching resolver processes on the next resolution.
>
> Thanks for your comments!
>
> No, this situation is not related to the foreign transaction resolution is
> bottle-neck or not. This issue may happen when the workload has very few
> foreign transactions.
>
> If new foreign transaction comes while the transaction resolver is processing
> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
> waits until starting next transaction resolution. If next foreign transaction
> doesn't come, the foreign transaction must wait starting resolution until
> timeout. I mentioned this situation.

Thanks for your explanation. I think that in this case we should set
the latch of the resolver after preparing all foreign transactions so
that the resolver process those transactions without sleep.

>
> Thanks for letting me know the side effect if setting resolution timeout to
> several milliseconds. I agree. But, why termination is needed? Is there a
> possibility to stale like walsender?

The purpose of this timeout is to terminate resolvers that are idle
for a long time. The resolver processes don't necessarily need to keep
running all the time for every database. On the other hand, launching
a resolver process per commit would be a high cost. So we have
resolver processes keep running at least for
foreign_transaction_resolver_timeout.

>
>
> >>
> >>
> >> b. about performance bottleneck (just share my simple benchmark results)
> >>
> >> The resolver process can be performance bottleneck easily although I think
> >> some users want this feature even if the performance is not so good.
> >>
> >> I tested with very simple workload in my laptop.
> >>
> >> The test condition is
> >> * two remote foreign partitions and one transaction inserts an entry in 
> >> each
> >> partitions.
> >> * local connection only. If NW latency became higher, the performance 
> >> became
> >> worse.
> >> * pgbench with 8 clients.
> >>
> >> The test results is the following. The performance of 2PC is only 10%
> >> performance of the one of without 2PC.
> >>
> >> * with foreign_twophase_commit = requried
> >> -> If load with more than 10TPS, the number of unresolved foreign 
> >> transactions
> >> is increasing and stop with the warning "Increase
> >> max_prepared_foreign_transactions".
> >
> > What was the value of max_prepared_foreign_transactions?
>
> Now, I tested with 200.
>
> If each resolution is finished very soon, I thought it's enough because
> 8clients x 2partitions = 16, though... But, it's difficult how to know the
> stable values.

During resolving one distributed transaction, the resolver needs both
one round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be
resolved, the resolver process can be easily bottle-neck given there
are 8

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiro Ikeda




On 2021/05/21 10:39, Masahiko Sawada wrote:
> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  
> wrote:
>>
>>
>> On 2021/05/11 13:37, Masahiko Sawada wrote:
>>> I've attached the updated patches that incorporated comments from
>>> Zhihong and Ikeda-san.
>>
>> Thanks for updating the patches!
>>
>>
>> I have other comments including trivial things.
>>
>>
>> a. about "foreign_transaction_resolver_timeout" parameter
>>
>> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
>> Is there any reason? Although the following is minor case, it may confuse 
>> some
>> users.
>>
>> Example case is that
>>
>> 1. a client executes transaction with 2PC when the resolver is processing
>> FdwXactResolverProcessInDoubtXacts().
>>
>> 2. the resolution of 1st transaction must be waited until the other
>> transactions for 2pc are executed or timeout.
>>
>> 3. if the client check the 1st result value, it should wait until resolution
>> is finished for atomic visibility (although it depends on the way how to
>> realize atomic visibility.) The clients may be waited
>> foreign_transaction_resolver_timeout". Users may think it's stale.
>>
>> Like this situation can be observed after testing with pgbench. Some
>> unresolved transaction remains after benchmarking.
>>
>> I assume that this default value refers to wal_sender, archiver, and so on.
>> But, I think this parameter is more like "commit_delay". If so, 60 seconds
>> seems to be big value.
> 
> IIUC this situation seems like the foreign transaction resolution is
> bottle-neck and doesn’t catch up to incoming resolution requests. But
> how foreignt_transaction_resolver_timeout relates to this situation?
> foreign_transaction_resolver_timeout controls when to terminate the
> resolver process that doesn't have any foreign transactions to
> resolve. So if we set it several milliseconds, resolver processes are
> terminated immediately after each resolution, imposing the cost of
> launching resolver processes on the next resolution.

Thanks for your comments!

No, this situation is not related to the foreign transaction resolution is
bottle-neck or not. This issue may happen when the workload has very few
foreign transactions.

If new foreign transaction comes while the transaction resolver is processing
resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
waits until starting next transaction resolution. If next foreign transaction
doesn't come, the foreign transaction must wait starting resolution until
timeout. I mentioned this situation.

Thanks for letting me know the side effect if setting resolution timeout to
several milliseconds. I agree. But, why termination is needed? Is there a
possibility to stale like walsender?


>>
>>
>> b. about performance bottleneck (just share my simple benchmark results)
>>
>> The resolver process can be performance bottleneck easily although I think
>> some users want this feature even if the performance is not so good.
>>
>> I tested with very simple workload in my laptop.
>>
>> The test condition is
>> * two remote foreign partitions and one transaction inserts an entry in each
>> partitions.
>> * local connection only. If NW latency became higher, the performance became
>> worse.
>> * pgbench with 8 clients.
>>
>> The test results is the following. The performance of 2PC is only 10%
>> performance of the one of without 2PC.
>>
>> * with foreign_twophase_commit = requried
>> -> If load with more than 10TPS, the number of unresolved foreign 
>> transactions
>> is increasing and stop with the warning "Increase
>> max_prepared_foreign_transactions".
> 
> What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough because
8clients x 2partitions = 16, though... But, it's difficult how to know the
stable values.


> To speed up the foreign transaction resolution, some ideas have been
> discussed. As another idea, how about launching resolvers for each
> foreign server? That way, we resolve foreign transactions on each
> foreign server in parallel. If foreign transactions are concentrated
> on the particular server, we can have multiple resolvers for the one
> foreign server. It doesn’t change the fact that all foreign
> transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server is
temporarily busy or stopped due to fail over, other foreign server's
transactions can be resolved.



> Apart from that, we also might want to improve foreign transaction
> management so that transaction doesn’t end up with an error if the
> foreign transaction resolution doesn’t catch up with incoming
> transactions that require 2PC. Maybe we can evict and serialize a
> state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
> future improvement.

Oh, great! I didn't come up with the idea.

Although I thought the feature makes difficult to know

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiko Sawada

On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  wrote:
>
>
> On 2021/05/11 13:37, Masahiko Sawada wrote:
> > I've attached the updated patches that incorporated comments from
> > Zhihong and Ikeda-san.
>
> Thanks for updating the patches!
>
>
> I have other comments including trivial things.
>
>
> a. about "foreign_transaction_resolver_timeout" parameter
>
> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
> Is there any reason? Although the following is minor case, it may confuse some
> users.
>
> Example case is that
>
> 1. a client executes transaction with 2PC when the resolver is processing
> FdwXactResolverProcessInDoubtXacts().
>
> 2. the resolution of 1st transaction must be waited until the other
> transactions for 2pc are executed or timeout.
>
> 3. if the client check the 1st result value, it should wait until resolution
> is finished for atomic visibility (although it depends on the way how to
> realize atomic visibility.) The clients may be waited
> foreign_transaction_resolver_timeout". Users may think it's stale.
>
> Like this situation can be observed after testing with pgbench. Some
> unresolved transaction remains after benchmarking.
>
> I assume that this default value refers to wal_sender, archiver, and so on.
> But, I think this parameter is more like "commit_delay". If so, 60 seconds
> seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

>
>
> b. about performance bottleneck (just share my simple benchmark results)
>
> The resolver process can be performance bottleneck easily although I think
> some users want this feature even if the performance is not so good.
>
> I tested with very simple workload in my laptop.
>
> The test condition is
> * two remote foreign partitions and one transaction inserts an entry in each
> partitions.
> * local connection only. If NW latency became higher, the performance became
> worse.
> * pgbench with 8 clients.
>
> The test results is the following. The performance of 2PC is only 10%
> performance of the one of without 2PC.
>
> * with foreign_twophase_commit = requried
> -> If load with more than 10TPS, the number of unresolved foreign transactions
> is increasing and stop with the warning "Increase
> max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Apart from that, we also might want to improve foreign transaction
management so that transaction doesn’t end up with an error if the
foreign transaction resolution doesn’t catch up with incoming
transactions that require 2PC. Maybe we can evict and serialize a
state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
future improvement.

> * with foreign_twophase_commit = disabled
> -> 122TPS in my environments.

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

>
>
> c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch
>
> * typo: s/tranasction/transaction/
>
> * Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
> (IsInParallelMode())" because make them in the same order as 
> CommitTransaction()?

I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if
(IsInParallelMode())" since other pre-commit works are done after
cleaning parallel contexts. What do you think?

>
> * functions name of fdwxact.c
>
> Although this depends on my feeling, xact means transaction. If this feeling
> same as you, the function names of FdwXactRegisterXact and so on are odd to
> me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better?
>

FdwXactRegisterEntry sounds good to me. Thanks.

> * Are the following better?
>
> - s/to register the foreign transaction by/to register the foreign transaction
> participant by/
>
> - s/The registered foreign transactions/The registered participants/
>
> - s/given foreign transaction/given foreign transaction participant/
>
> - s/Foreign transactions involved in the current transaction/Foreign
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-19 Thread Masahiro Ikeda

On 2021/05/11 13:37, Masahiko Sawada wrote:
> I've attached the updated patches that incorporated comments from
> Zhihong and Ikeda-san.

Thanks for updating the patches!

I have other comments including trivial things.

a. about "foreign_transaction_resolver_timeout" parameter

Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
Is there any reason? Although the following is minor case, it may confuse some
users.

Example case is that

1. a client executes transaction with 2PC when the resolver is processing
FdwXactResolverProcessInDoubtXacts().

2. the resolution of 1st transaction must be waited until the other
transactions for 2pc are executed or timeout.

3. if the client check the 1st result value, it should wait until resolution
is finished for atomic visibility (although it depends on the way how to
realize atomic visibility.) The clients may be waited
foreign_transaction_resolver_timeout". Users may think it's stale.

Like this situation can be observed after testing with pgbench. Some
unresolved transaction remains after benchmarking.

I assume that this default value refers to wal_sender, archiver, and so on.
But, I think this parameter is more like "commit_delay". If so, 60 seconds
seems to be big value.

b. about performance bottleneck (just share my simple benchmark results)

The resolver process can be performance bottleneck easily although I think
some users want this feature even if the performance is not so good.

I tested with very simple workload in my laptop.

The test condition is
* two remote foreign partitions and one transaction inserts an entry in each
partitions.
* local connection only. If NW latency became higher, the performance became
worse.
* pgbench with 8 clients.

The test results is the following. The performance of 2PC is only 10%
performance of the one of without 2PC.

* with foreign_twophase_commit = requried
-> If load with more than 10TPS, the number of unresolved foreign transactions
is increasing and stop with the warning "Increase
max_prepared_foreign_transactions".

* with foreign_twophase_commit = disabled
-> 122TPS in my environments.

c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch

* typo: s/tranasction/transaction/

* Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
(IsInParallelMode())" because make them in the same order as 
CommitTransaction()?

* functions name of fdwxact.c

Although this depends on my feeling, xact means transaction. If this feeling
same as you, the function names of FdwXactRegisterXact and so on are odd to
me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better?

* Are the following better?

- s/to register the foreign transaction by/to register the foreign transaction
participant by/

- s/The registered foreign transactions/The registered participants/

- s/given foreign transaction/given foreign transaction participant/

- s/Foreign transactions involved in the current transaction/Foreign
transaction participants involved in the current transaction/

Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-12 Thread Zhihong Yu

On Mon, May 10, 2021 at 9:38 PM Masahiko Sawada 
wrote:

> On Mon, May 3, 2021 at 11:11 PM Zhihong Yu  wrote:
> >
> >
> >
> > On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada 
> wrote:
> >>
> >> On Sun, May 2, 2021 at 1:23 AM Zhihong Yu  wrote:
> >> >
> >> >
> >> >
> >> > On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada <
> sawada.m...@gmail.com> wrote:
> >> >>
> >> >> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
> >> >> >
> >> >> > Hi,
> >> >> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
> >> >>
> >> >> Thank you for reviewing the patch!
> >> >>
> >> >> >
> >> >> > With this commit, the foreign server modified within the
> transaction marked as 'modified'.
> >> >> >
> >> >> > transaction marked -> transaction is marked
> >> >>
> >> >> Will fix.
> >> >>
> >> >> >
> >> >> > +#define IsForeignTwophaseCommitRequested() \
> >> >> > +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
> >> >> >
> >> >> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think
> the macro should be named: IsForeignTwophaseCommitRequired.
> >> >>
> >> >> But even if foreign_twophase_commit is
> >> >> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
> >> >> there is only one modified server, right? It seems the name
> >> >> IsForeignTwophaseCommitRequested is fine.
> >> >>
> >> >> >
> >> >> > +static bool
> >> >> > +checkForeignTwophaseCommitRequired(bool local_modified)
> >> >> >
> >> >> > +   if (!ServerSupportTwophaseCommit(fdw_part))
> >> >> > +   have_no_twophase = true;
> >> >> > ...
> >> >> > +   if (have_no_twophase)
> >> >> > +   ereport(ERROR,
> >> >> >
> >> >> > It seems the error case should be reported within the loop. This
> way, we don't need to iterate the other participant(s).
> >> >> > Accordingly, nserverswritten should be incremented for local
> server prior to the loop. The condition in the loop would become if
> (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> >> >> > have_no_twophase is no longer needed.
> >> >>
> >> >> Hmm, I think If we process one 2pc-non-capable server first and then
> >> >> process another one 2pc-capable server, we should raise an error but
> >> >> cannot detect that.
> >> >
> >> >
> >> > Then the check would stay as what you have in the patch:
> >> >
> >> >   if (!ServerSupportTwophaseCommit(fdw_part))
> >> >
> >> > When the non-2pc-capable server is encountered, we would report the
> error in place (following the ServerSupportTwophaseCommit check) and come
> out of the loop.
> >> > have_no_twophase can be dropped.
> >>
> >> But if we processed only one non-2pc-capable server, we would raise an
> >> error but should not in that case.
> >>
> >> On second thought, I think we can track how many servers are modified
> >> or not capable of 2PC during registration and unr-egistration. Then we
> >> can consider both 2PC is required and there is non-2pc-capable server
> >> is involved without looking through all participants. Thoughts?
> >
> >
> > That is something worth trying.
> >
>
> I've attached the updated patches that incorporated comments from
> Zhihong and Ikeda-san.
>
> Regards,
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/


Hi,
For v36-0005-Prepare-foreign-transactions-at-commit-time.patch :

With this commit, the foreign server modified within the transaction
marked as 'modified'.

The verb is missing from the above sentence. 'within the transaction marked
' -> within the transaction is marked

+   /* true if modified the data on the server */

modified the data -> data is modified

+   xid = GetTopTransactionIdIfAny();
...
+   if (!TransactionIdIsValid(xid))
+   xid = GetTopTransactionId();

I wonder when the above if condition is true, would
the GetTopTransactionId() get valid xid ? It seems the two func calls are
the same.

I like the way checkForeignTwophaseCommitRequired() is structured.

Cheers

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-03 Thread Zhihong Yu

On Mon, May 3, 2021 at 5:25 AM Masahiko Sawada 
wrote:

> On Sun, May 2, 2021 at 1:23 AM Zhihong Yu  wrote:
> >
> >
> >
> > On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada 
> wrote:
> >>
> >> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
> >> >
> >> > Hi,
> >> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
> >>
> >> Thank you for reviewing the patch!
> >>
> >> >
> >> > With this commit, the foreign server modified within the transaction
> marked as 'modified'.
> >> >
> >> > transaction marked -> transaction is marked
> >>
> >> Will fix.
> >>
> >> >
> >> > +#define IsForeignTwophaseCommitRequested() \
> >> > +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
> >> >
> >> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the
> macro should be named: IsForeignTwophaseCommitRequired.
> >>
> >> But even if foreign_twophase_commit is
> >> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
> >> there is only one modified server, right? It seems the name
> >> IsForeignTwophaseCommitRequested is fine.
> >>
> >> >
> >> > +static bool
> >> > +checkForeignTwophaseCommitRequired(bool local_modified)
> >> >
> >> > +   if (!ServerSupportTwophaseCommit(fdw_part))
> >> > +   have_no_twophase = true;
> >> > ...
> >> > +   if (have_no_twophase)
> >> > +   ereport(ERROR,
> >> >
> >> > It seems the error case should be reported within the loop. This way,
> we don't need to iterate the other participant(s).
> >> > Accordingly, nserverswritten should be incremented for local server
> prior to the loop. The condition in the loop would become if
> (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> >> > have_no_twophase is no longer needed.
> >>
> >> Hmm, I think If we process one 2pc-non-capable server first and then
> >> process another one 2pc-capable server, we should raise an error but
> >> cannot detect that.
> >
> >
> > Then the check would stay as what you have in the patch:
> >
> >   if (!ServerSupportTwophaseCommit(fdw_part))
> >
> > When the non-2pc-capable server is encountered, we would report the
> error in place (following the ServerSupportTwophaseCommit check) and come
> out of the loop.
> > have_no_twophase can be dropped.
>
> But if we processed only one non-2pc-capable server, we would raise an
> error but should not in that case.
>
> On second thought, I think we can track how many servers are modified
> or not capable of 2PC during registration and unr-egistration. Then we
> can consider both 2PC is required and there is non-2pc-capable server
> is involved without looking through all participants. Thoughts?
>

That is something worth trying.

Thanks


>
> Regards,
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-03 Thread Masahiko Sawada

On Sun, May 2, 2021 at 1:23 AM Zhihong Yu  wrote:
>
>
>
> On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada  wrote:
>>
>> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
>> >
>> > Hi,
>> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
>>
>> Thank you for reviewing the patch!
>>
>> >
>> > With this commit, the foreign server modified within the transaction 
>> > marked as 'modified'.
>> >
>> > transaction marked -> transaction is marked
>>
>> Will fix.
>>
>> >
>> > +#define IsForeignTwophaseCommitRequested() \
>> > +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
>> >
>> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the 
>> > macro should be named: IsForeignTwophaseCommitRequired.
>>
>> But even if foreign_twophase_commit is
>> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
>> there is only one modified server, right? It seems the name
>> IsForeignTwophaseCommitRequested is fine.
>>
>> >
>> > +static bool
>> > +checkForeignTwophaseCommitRequired(bool local_modified)
>> >
>> > +   if (!ServerSupportTwophaseCommit(fdw_part))
>> > +   have_no_twophase = true;
>> > ...
>> > +   if (have_no_twophase)
>> > +   ereport(ERROR,
>> >
>> > It seems the error case should be reported within the loop. This way, we 
>> > don't need to iterate the other participant(s).
>> > Accordingly, nserverswritten should be incremented for local server prior 
>> > to the loop. The condition in the loop would become if 
>> > (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
>> > have_no_twophase is no longer needed.
>>
>> Hmm, I think If we process one 2pc-non-capable server first and then
>> process another one 2pc-capable server, we should raise an error but
>> cannot detect that.
>
>
> Then the check would stay as what you have in the patch:
>
>   if (!ServerSupportTwophaseCommit(fdw_part))
>
> When the non-2pc-capable server is encountered, we would report the error in 
> place (following the ServerSupportTwophaseCommit check) and come out of the 
> loop.
> have_no_twophase can be dropped.

But if we processed only one non-2pc-capable server, we would raise an
error but should not in that case.

On second thought, I think we can track how many servers are modified
or not capable of 2PC during registration and unr-egistration. Then we
can consider both 2PC is required and there is non-2pc-capable server
is involved without looking through all participants. Thoughts?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-01 Thread Zhihong Yu

On Fri, Apr 30, 2021 at 9:09 PM Masahiko Sawada 
wrote:

> On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
> >
> > Hi,
> > For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :
>
> Thank you for reviewing the patch!
>
> >
> > With this commit, the foreign server modified within the transaction
> marked as 'modified'.
> >
> > transaction marked -> transaction is marked
>
> Will fix.
>
> >
> > +#define IsForeignTwophaseCommitRequested() \
> > +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
> >
> > Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the
> macro should be named: IsForeignTwophaseCommitRequired.
>
> But even if foreign_twophase_commit is
> FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
> there is only one modified server, right? It seems the name
> IsForeignTwophaseCommitRequested is fine.
>
> >
> > +static bool
> > +checkForeignTwophaseCommitRequired(bool local_modified)
> >
> > +   if (!ServerSupportTwophaseCommit(fdw_part))
> > +   have_no_twophase = true;
> > ...
> > +   if (have_no_twophase)
> > +   ereport(ERROR,
> >
> > It seems the error case should be reported within the loop. This way, we
> don't need to iterate the other participant(s).
> > Accordingly, nserverswritten should be incremented for local server
> prior to the loop. The condition in the loop would become if
> (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> > have_no_twophase is no longer needed.
>
> Hmm, I think If we process one 2pc-non-capable server first and then
> process another one 2pc-capable server, we should raise an error but
> cannot detect that.
>

Then the check would stay as what you have in the patch:

  if (!ServerSupportTwophaseCommit(fdw_part))

When the non-2pc-capable server is encountered, we would report the error
in place (following the ServerSupportTwophaseCommit check) and come out of
the loop.
have_no_twophase can be dropped.

Thanks


>
> Regards,
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-04-30 Thread Masahiko Sawada

On Wed, Mar 17, 2021 at 6:03 PM Zhihong Yu  wrote:
>
> Hi,
> For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

>
> With this commit, the foreign server modified within the transaction marked 
> as 'modified'.
>
> transaction marked -> transaction is marked

Will fix.

>
> +#define IsForeignTwophaseCommitRequested() \
> +(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)
>
> Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro 
> should be named: IsForeignTwophaseCommitRequired.

But even if foreign_twophase_commit is
FOREIGN_TWOPHASE_COMMIT_REQUIRED, the two-phase commit is not used if
there is only one modified server, right? It seems the name
IsForeignTwophaseCommitRequested is fine.

>
> +static bool
> +checkForeignTwophaseCommitRequired(bool local_modified)
>
> +   if (!ServerSupportTwophaseCommit(fdw_part))
> +   have_no_twophase = true;
> ...
> +   if (have_no_twophase)
> +   ereport(ERROR,
>
> It seems the error case should be reported within the loop. This way, we 
> don't need to iterate the other participant(s).
> Accordingly, nserverswritten should be incremented for local server prior to 
> the loop. The condition in the loop would become if 
> (!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
> have_no_twophase is no longer needed.

Hmm, I think If we process one 2pc-non-capable server first and then
process another one 2pc-capable server, we should raise an error but
cannot detect that.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-04-30 Thread Masahiko Sawada

On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda
 wrote:
>
>
>
> On 2021/03/17 12:03, Masahiko Sawada wrote:
> > I've attached the updated version patch set.
>
> Thanks for updating the patches! I'm now restarting to review of 2PC because
> I'd like to use this feature in PG15.

Thank you for reviewing the patch! Much appreciated.

>
>
> I think the following logic of resolving and removing the fdwxact entries
> by the transaction resolver needs to be fixed.
>
> 1. check if pending fdwxact entries exist
>
> HoldInDoubtFdwXacts() checks if there are entries which the condition is
> InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
> array. The fdwXactLock is released at the end of this phase.
>
> 2. resolve and remove the entries held in 1th phase.
>
> ResolveFdwXacts() resloves the status per each fdwxact entry using the
> indexes. The end of resolving, the transaction resolver remove the entry in
> fdwxacts array via remove_fdwact().
>
> The way to remove the entry is the following. Since to control using the
> index, the indexes of getting in the 1st phase are meaningless anymore.
>
> /* Remove the entry from active array */
> FdwXactCtl->num_fdwxacts--;
> FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
>
> This seems to lead resolving the unexpected fdwxacts and it can occur the
> following assertion error. That's why I noticed. For example, there is the
> case which a backend inserts new fdwxact entry in the free space, which the
> resolver removed the entry right before, and the resolver accesses the new
> entry which doesn't need to resolve yet because it use the indexes checked in
> 1st phase.
>
> Assert(fdwxact->lockeing_backend == MyBackendId);

Good point. I agree with your analysis.

>
>
>
> The simple solution is that to get fdwXactLock exclusive all the time from the
> begining of 1st phase to the finishing of 2nd phase. But, I worried that the
> performance impact became too big...
>
> I came up with two solutions although there may be better solutions.
>
> A. to remove resolved entries at once after resolution for all held entries is
> finished
>
> If so, we don't need to take the exclusive lock for a long time. But, this
> have other problems, which pg_remove_foreign_xact() can still remove entries
> and we need to handle the fail of resolving.
>
> I wondered that we can solve the first problem to introduce a new lock like
> "removing lock" and only the processes which hold the lock can remove the
> entries. The performance impact is limited since the insertion the fdwxact
> entries is not blocked by this lock. And second problem can be solved using
> try-catch sentence.
>
>
> B. to merge 1st and 2nd phase
>
> Now, the resolver resolves the entries together. That's the reason why it's
> difficult to remove the entries. So, it seems to solve the problem to execute
> checking, resolving and removing per each entry. I think it's better since
> this is simpler than A. If I'm missing something, please let me know.

It seems to me that solution B would be simpler and better. I'll try
to fix this issue by using solution B and rebase the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-04-26 Thread Masahiro Ikeda




On 2021/03/17 12:03, Masahiko Sawada wrote:
> I've attached the updated version patch set.

Thanks for updating the patches! I'm now restarting to review of 2PC because
I'd like to use this feature in PG15.


I think the following logic of resolving and removing the fdwxact entries
by the transaction resolver needs to be fixed.

1. check if pending fdwxact entries exist

HoldInDoubtFdwXacts() checks if there are entries which the condition is
InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
array. The fdwXactLock is released at the end of this phase.

2. resolve and remove the entries held in 1th phase.

ResolveFdwXacts() resloves the status per each fdwxact entry using the
indexes. The end of resolving, the transaction resolver remove the entry in
fdwxacts array via remove_fdwact().

The way to remove the entry is the following. Since to control using the
index, the indexes of getting in the 1st phase are meaningless anymore.

/* Remove the entry from active array */
FdwXactCtl->num_fdwxacts--;
FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];

This seems to lead resolving the unexpected fdwxacts and it can occur the
following assertion error. That's why I noticed. For example, there is the
case which a backend inserts new fdwxact entry in the free space, which the
resolver removed the entry right before, and the resolver accesses the new
entry which doesn't need to resolve yet because it use the indexes checked in
1st phase.

Assert(fdwxact->locking_backend == MyBackendId);



The simple solution is that to get fdwXactLock exclusive all the time from the
begining of 1st phase to the finishing of 2nd phase. But, I worried that the
performance impact became too big...

I came up with two solutions although there may be better solutions.

A. to remove resolved entries at once after resolution for all held entries is
finished

If so, we don't need to take the exclusive lock for a long time. But, this
have other problems, which pg_remove_foreign_xact() can still remove entries
and we need to handle the fail of resolving.

I wondered that we can solve the first problem to introduce a new lock like
"removing lock" and only the processes which hold the lock can remove the
entries. The performance impact is limited since the insertion the fdwxact
entries is not blocked by this lock. And second problem can be solved using
try-catch sentence.


B. to merge 1st and 2nd phase

Now, the resolver resolves the entries together. That's the reason why it's
difficult to remove the entries. So, it seems to solve the problem to execute
checking, resolving and removing per each entry. I think it's better since
this is simpler than A. If I'm missing something, please let me know.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-03-17 Thread Zhihong Yu

Hi,
For v35-0007-Prepare-foreign-transactions-at-commit-time.patch :

With this commit, the foreign server modified within the transaction marked
as 'modified'.

transaction marked -> transaction is marked

+#define IsForeignTwophaseCommitRequested() \
+(foreign_twophase_commit > FOREIGN_TWOPHASE_COMMIT_DISABLED)

Since the other enum is FOREIGN_TWOPHASE_COMMIT_REQUIRED, I think the macro
should be named: IsForeignTwophaseCommitRequired.

+static bool
+checkForeignTwophaseCommitRequired(bool local_modified)

+   if (!ServerSupportTwophaseCommit(fdw_part))
+   have_no_twophase = true;
...
+   if (have_no_twophase)
+   ereport(ERROR,

It seems the error case should be reported within the loop. This way, we
don't need to iterate the other participant(s).
Accordingly, nserverswritten should be incremented for local server prior
to the loop. The condition in the loop would become if
(!ServerSupportTwophaseCommit(fdw_part) && nserverswritten > 1).
have_no_twophase is no longer needed.

Cheers

On Tue, Mar 16, 2021 at 8:04 PM Masahiko Sawada 
wrote:

> On Mon, Mar 15, 2021 at 3:55 AM Ibrar Ahmed  wrote:
> >
> >
> >
> > On Thu, Feb 11, 2021 at 6:25 PM Masahiko Sawada 
> wrote:
> >>
> >> On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada 
> wrote:
> >> >
> >> > On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <
> masao.fu...@oss.nttdata.com> wrote:
> >> > >
> >> > >
> >> > >
> >> > > On 2021/01/27 14:08, Masahiko Sawada wrote:
> >> > > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> >> > > >  wrote:
> >> > > >>
> >> > > >>
> >> > > >> You fixed some issues. But maybe you forgot to attach the latest
> patches?
> >> > > >
> >> > > > Yes, I've attached the updated patches.
> >> > >
> >> > > Thanks for updating the patch! I tried to review 0001 and 0002 as
> the self-contained change.
> >> > >
> >> > > + * An FDW that implements both commit and rollback APIs can
> request to register
> >> > > + * the foreign transaction by FdwXactRegisterXact() to participate
> it to a
> >> > > + * group of distributed tranasction.  The registered foreign
> transactions are
> >> > > + * identified by OIDs of server and user.
> >> > >
> >> > > I'm afraid that the combination of OIDs of server and user is not
> unique. IOW, more than one foreign transactions can have the same
> combination of OIDs of server and user. For example, the following two
> SELECT queries start the different foreign transactions but their user OID
> is the same. OID of user mapping should be used instead of OID of user?
> >> > >
> >> > >  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
> >> > >  CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user
> 'postgres');
> >> > >  CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user
> 'postgres');
> >> > >  CREATE TABLE t(i int);
> >> > >  CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS
> (table_name 't');
> >> > >  BEGIN;
> >> > >  SELECT * FROM ft;
> >> > >  DROP USER MAPPING FOR postgres SERVER loopback ;
> >> > >  SELECT * FROM ft;
> >> > >  COMMIT;
> >> >
> >> > Good catch. I've considered using user mapping OID or a pair of user
> >> > mapping OID and server OID as a key of foreign transactions but I
> >> > think it also has a problem if an FDW caches the connection by pair of
> >> > server OID and user OID whereas the core identifies them by user
> >> > mapping OID. For instance, mysql_fdw manages connections by pair of
> >> > server OID and user OID.
> >> >
> >> > For example, let's consider the following execution:
> >> >
> >> > BEGIN;
> >> > SET ROLE user_A;
> >> > INSERT INTO ft1 VALUES (1);
> >> > SET ROLE user_B;
> >> > INSERT INTO ft1 VALUES (1);
> >> > COMMIT;
> >> >
> >> > Suppose that an FDW identifies the connections by {server OID, user
> >> > OID} and the core GTM identifies the transactions by user mapping OID,
> >> > and user_A and user_B use the public user mapping to connect server_X.
> >> > In the FDW, there are two connections identified by {user_A, sever_X}
> >> > and {user_B, server_X} respectively, and therefore opens two
> >> > transactions on each connection, while GTM has only one FdwXact entry
> >> > because the two connections refer to the same user mapping OID. As a
> >> > result, at the end of the transaction, GTM ends only one foreign
> >> > transaction, leaving another one.
> >> >
> >> > Using user mapping OID seems natural to me but I'm concerned that
> >> > changing role in the middle of transaction is likely to happen than
> >> > dropping the public user mapping but not sure. We would need to find
> >> > more better way.
> >>
> >> After more thought, I'm inclined to think it's better to identify
> >> foreign transactions by user mapping OID. The main reason is, I think
> >> FDWs that manages connection caches by pair of user OID and server OID
> >> potentially has a problem with the scenario Fujii-san mentioned. If an
> >> FDW has to use another user mapping (i.g., connection information) due
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-03-14 Thread Ibrar Ahmed

On Thu, Feb 11, 2021 at 6:25 PM Masahiko Sawada 
wrote:

> On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada 
> wrote:
> >
> > On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao 
> wrote:
> > >
> > >
> > >
> > > On 2021/01/27 14:08, Masahiko Sawada wrote:
> > > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> > > >  wrote:
> > > >>
> > > >>
> > > >> You fixed some issues. But maybe you forgot to attach the latest
> patches?
> > > >
> > > > Yes, I've attached the updated patches.
> > >
> > > Thanks for updating the patch! I tried to review 0001 and 0002 as the
> self-contained change.
> > >
> > > + * An FDW that implements both commit and rollback APIs can request
> to register
> > > + * the foreign transaction by FdwXactRegisterXact() to participate it
> to a
> > > + * group of distributed tranasction.  The registered foreign
> transactions are
> > > + * identified by OIDs of server and user.
> > >
> > > I'm afraid that the combination of OIDs of server and user is not
> unique. IOW, more than one foreign transactions can have the same
> combination of OIDs of server and user. For example, the following two
> SELECT queries start the different foreign transactions but their user OID
> is the same. OID of user mapping should be used instead of OID of user?
> > >
> > >  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
> > >  CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user
> 'postgres');
> > >  CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user
> 'postgres');
> > >  CREATE TABLE t(i int);
> > >  CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS
> (table_name 't');
> > >  BEGIN;
> > >  SELECT * FROM ft;
> > >  DROP USER MAPPING FOR postgres SERVER loopback ;
> > >  SELECT * FROM ft;
> > >  COMMIT;
> >
> > Good catch. I've considered using user mapping OID or a pair of user
> > mapping OID and server OID as a key of foreign transactions but I
> > think it also has a problem if an FDW caches the connection by pair of
> > server OID and user OID whereas the core identifies them by user
> > mapping OID. For instance, mysql_fdw manages connections by pair of
> > server OID and user OID.
> >
> > For example, let's consider the following execution:
> >
> > BEGIN;
> > SET ROLE user_A;
> > INSERT INTO ft1 VALUES (1);
> > SET ROLE user_B;
> > INSERT INTO ft1 VALUES (1);
> > COMMIT;
> >
> > Suppose that an FDW identifies the connections by {server OID, user
> > OID} and the core GTM identifies the transactions by user mapping OID,
> > and user_A and user_B use the public user mapping to connect server_X.
> > In the FDW, there are two connections identified by {user_A, sever_X}
> > and {user_B, server_X} respectively, and therefore opens two
> > transactions on each connection, while GTM has only one FdwXact entry
> > because the two connections refer to the same user mapping OID. As a
> > result, at the end of the transaction, GTM ends only one foreign
> > transaction, leaving another one.
> >
> > Using user mapping OID seems natural to me but I'm concerned that
> > changing role in the middle of transaction is likely to happen than
> > dropping the public user mapping but not sure. We would need to find
> > more better way.
>
> After more thought, I'm inclined to think it's better to identify
> foreign transactions by user mapping OID. The main reason is, I think
> FDWs that manages connection caches by pair of user OID and server OID
> potentially has a problem with the scenario Fujii-san mentioned. If an
> FDW has to use another user mapping (i.g., connection information) due
> to the currently used user mapping being removed, it would have to
> disconnect the previous connection because it has to use the same
> connection cache. But at that time it doesn't know the transaction
> will be committed or aborted.
>
> Also, such FDW has the same problem that postgres_fdw used to have; a
> backend establishes multiple connections with the same connection
> information if multiple local users use the public user mapping. Even
> from the perspective of foreign transaction management, it more makes
> sense that foreign transactions correspond to the connections to
> foreign servers, not to the local connection information.
>
> I can see that some FDW implementations such as mysql_fdw and
> firebird_fdw identify connections by pair of server OID and user OID
> but I think this is because they consulted to old postgres_fdw code. I
> suspect that there is no use case where FDW needs to identify
> connections in that way. If the core GTM identifies them by user
> mapping OID, we could enforce those FDWs to change their way but I
> think that change would be the right improvement.
>
> Regards,
>
> --
> Masahiko Sawada
> EDB:  https://www.enterprisedb.com/
>
>
>
Regression is failing, can you please take a look.

https://cirrus-ci.com/task/5522445932167168


t/080_pg_isready.pl ... ok
# Failed test 'parallel reindexdb for system with --concurrently skips

Re: Transactions involving multiple postgres foreign servers, take 2

2021-02-11 Thread Masahiko Sawada

On Fri, Feb 5, 2021 at 2:45 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao  
> wrote:
> >
> >
> >
> > On 2021/01/27 14:08, Masahiko Sawada wrote:
> > > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> > >  wrote:
> > >>
> > >>
> > >> You fixed some issues. But maybe you forgot to attach the latest patches?
> > >
> > > Yes, I've attached the updated patches.
> >
> > Thanks for updating the patch! I tried to review 0001 and 0002 as the 
> > self-contained change.
> >
> > + * An FDW that implements both commit and rollback APIs can request to 
> > register
> > + * the foreign transaction by FdwXactRegisterXact() to participate it to a
> > + * group of distributed tranasction.  The registered foreign transactions 
> > are
> > + * identified by OIDs of server and user.
> >
> > I'm afraid that the combination of OIDs of server and user is not unique. 
> > IOW, more than one foreign transactions can have the same combination of 
> > OIDs of server and user. For example, the following two SELECT queries 
> > start the different foreign transactions but their user OID is the same. 
> > OID of user mapping should be used instead of OID of user?
> >
> >  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
> >  CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 
> > 'postgres');
> >  CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 
> > 'postgres');
> >  CREATE TABLE t(i int);
> >  CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 
> > 't');
> >  BEGIN;
> >  SELECT * FROM ft;
> >  DROP USER MAPPING FOR postgres SERVER loopback ;
> >  SELECT * FROM ft;
> >  COMMIT;
>
> Good catch. I've considered using user mapping OID or a pair of user
> mapping OID and server OID as a key of foreign transactions but I
> think it also has a problem if an FDW caches the connection by pair of
> server OID and user OID whereas the core identifies them by user
> mapping OID. For instance, mysql_fdw manages connections by pair of
> server OID and user OID.
>
> For example, let's consider the following execution:
>
> BEGIN;
> SET ROLE user_A;
> INSERT INTO ft1 VALUES (1);
> SET ROLE user_B;
> INSERT INTO ft1 VALUES (1);
> COMMIT;
>
> Suppose that an FDW identifies the connections by {server OID, user
> OID} and the core GTM identifies the transactions by user mapping OID,
> and user_A and user_B use the public user mapping to connect server_X.
> In the FDW, there are two connections identified by {user_A, sever_X}
> and {user_B, server_X} respectively, and therefore opens two
> transactions on each connection, while GTM has only one FdwXact entry
> because the two connections refer to the same user mapping OID. As a
> result, at the end of the transaction, GTM ends only one foreign
> transaction, leaving another one.
>
> Using user mapping OID seems natural to me but I'm concerned that
> changing role in the middle of transaction is likely to happen than
> dropping the public user mapping but not sure. We would need to find
> more better way.

After more thought, I'm inclined to think it's better to identify
foreign transactions by user mapping OID. The main reason is, I think
FDWs that manages connection caches by pair of user OID and server OID
potentially has a problem with the scenario Fujii-san mentioned. If an
FDW has to use another user mapping (i.g., connection information) due
to the currently used user mapping being removed, it would have to
disconnect the previous connection because it has to use the same
connection cache. But at that time it doesn't know the transaction
will be committed or aborted.

Also, such FDW has the same problem that postgres_fdw used to have; a
backend establishes multiple connections with the same connection
information if multiple local users use the public user mapping. Even
from the perspective of foreign transaction management, it more makes
sense that foreign transactions correspond to the connections to
foreign servers, not to the local connection information.

I can see that some FDW implementations such as mysql_fdw and
firebird_fdw identify connections by pair of server OID and user OID
but I think this is because they consulted to old postgres_fdw code. I
suspect that there is no use case where FDW needs to identify
connections in that way. If the core GTM identifies them by user
mapping OID, we could enforce those FDWs to change their way but I
think that change would be the right improvement.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-02-04 Thread Masahiko Sawada

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao  wrote:
>
>
>
> On 2021/01/27 14:08, Masahiko Sawada wrote:
> > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> >  wrote:
> >>
> >>
> >> You fixed some issues. But maybe you forgot to attach the latest patches?
> >
> > Yes, I've attached the updated patches.
>
> Thanks for updating the patch! I tried to review 0001 and 0002 as the 
> self-contained change.
>
> + * An FDW that implements both commit and rollback APIs can request to 
> register
> + * the foreign transaction by FdwXactRegisterXact() to participate it to a
> + * group of distributed tranasction.  The registered foreign transactions are
> + * identified by OIDs of server and user.
>
> I'm afraid that the combination of OIDs of server and user is not unique. 
> IOW, more than one foreign transactions can have the same combination of OIDs 
> of server and user. For example, the following two SELECT queries start the 
> different foreign transactions but their user OID is the same. OID of user 
> mapping should be used instead of OID of user?
>
>  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
>  CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 
> 'postgres');
>  CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
>  CREATE TABLE t(i int);
>  CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
>  BEGIN;
>  SELECT * FROM ft;
>  DROP USER MAPPING FOR postgres SERVER loopback ;
>  SELECT * FROM ft;
>  COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

>
> +   /* Commit foreign transactions if any */
> +   AtEOXact_FdwXact(true);
>
> Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT 
> flag? Probably we don't need to do this if postgres_fdw is only user of this 
> new API. But if we make this new API generic one, such flags seem necessary 
> so that some foreign data wrappers might have different behaviors for those 
> flags.
>
> Because of the same reason as above, AtEOXact_FdwXact() should also be called 
> after CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT : 
> XACT_EVENT_COMMIT)?

Agreed.

In AtEOXact_FdwXact() we call either CommitForeignTransaction() or
RollbackForeignTransaction() with FDWXACT_FLAG_ONEPHASE flag for each
foreign transaction. So for example in commit case, we will call new
FDW APIs in the following order:

1. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.
2. Commit locally.
3. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.

In the future when we have a new FDW API to prepare foreign
transaction, the sequence will be:

1. Call PrepareForeignTransaction() for each foreign transaction.
2. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag for each foreign transaction.
3. Commit locally.
4. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag for each foreign transaction.

So we expect FDW that wants to support 2PC not to commit foreign
transaction if CommitForeignTransaction() is called with
XACT_EVENT_PARALLEL_PRE_COMMIT flag and no FDWXACT_FLAG_ONEPHASE flag.

>
> +   /*
> +* Abort foreign transactions if any.  This needs to be done before 
> marking
> +* this transaction as not running since FDW's transaction callbacks 
> might
> +* assume this transaction is still in progress.
> +*/
> +   AtEOXact_FdwXact(false);
>
> Same as above.
>
> +/*
> + * This function is called at PREPARE TRANSACTION.  Since we don't support
> + *

Re: Transactions involving multiple postgres foreign servers, take 2

2021-02-02 Thread Fujii Masao





On 2021/01/27 14:08, Masahiko Sawada wrote:

On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
 wrote:



You fixed some issues. But maybe you forgot to attach the latest patches?


Yes, I've attached the updated patches.


Thanks for updating the patch! I tried to review 0001 and 0002 as the 
self-contained change.

+ * An FDW that implements both commit and rollback APIs can request to register
+ * the foreign transaction by FdwXactRegisterXact() to participate it to a
+ * group of distributed tranasction.  The registered foreign transactions are
+ * identified by OIDs of server and user.

I'm afraid that the combination of OIDs of server and user is not unique. IOW, 
more than one foreign transactions can have the same combination of OIDs of 
server and user. For example, the following two SELECT queries start the 
different foreign transactions but their user OID is the same. OID of user 
mapping should be used instead of OID of user?

CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t(i int);
CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
BEGIN;
SELECT * FROM ft;
DROP USER MAPPING FOR postgres SERVER loopback ;
SELECT * FROM ft;
COMMIT;

+   /* Commit foreign transactions if any */
+   AtEOXact_FdwXact(true);

Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT 
flag? Probably we don't need to do this if postgres_fdw is only user of this 
new API. But if we make this new API generic one, such flags seem necessary so 
that some foreign data wrappers might have different behaviors for those flags.

Because of the same reason as above, AtEOXact_FdwXact() should also be called 
after CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT : 
XACT_EVENT_COMMIT)?

+   /*
+* Abort foreign transactions if any.  This needs to be done before 
marking
+* this transaction as not running since FDW's transaction callbacks 
might
+* assume this transaction is still in progress.
+*/
+   AtEOXact_FdwXact(false);

Same as above.

+/*
+ * This function is called at PREPARE TRANSACTION.  Since we don't support
+ * preparing foreign transactions yet, raise an error if the local transaction
+ * has any foreign transaction.
+ */
+void
+AtPrepare_FdwXact(void)
+{
+   if (FdwXactParticipants != NIL)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot PREPARE a transaction that has 
operated on foreign tables")));
+}

This means that some foreign data wrappers suppporting the prepare transaction 
(though I'm not sure if such wappers actually exist or not) cannot use the new 
API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() 
should call the prepare callback and each wrapper should emit an error within 
the callback if necessary.

+   foreach(lc, FdwXactParticipants)
+   {
+   FdwXactParticipant *fdw_part = (FdwXactParticipant *) 
lfirst(lc);
+
+   if (fdw_part->server->serverid == serverid &&
+   fdw_part->usermapping->userid == userid)

Isn't this ineffecient when starting lots of foreign transactions because we 
need to scan all the entries in the list every time?

+static ConnCacheEntry *
+GetConnectionCacheEntry(Oid umid)
+{
+   boolfound;
+   ConnCacheEntry *entry;
+   ConnCacheKey key;
+
+   /* First time through, initialize connection cache hashtable */
+   if (ConnectionHash == NULL)
+   {
+   HASHCTL ctl;
+
+   ctl.keysize = sizeof(ConnCacheKey);
+   ctl.entrysize = sizeof(ConnCacheEntry);
+   ConnectionHash = hash_create("postgres_fdw connections", 8,
+,
+
HASH_ELEM | HASH_BLOBS);

Currently ConnectionHash is created under TopMemoryContext. With the patch, 
since GetConnectionCacheEntry() can be called in other places, ConnectionHash 
may be created under the memory context other than TopMemoryContext? If so, 
that's safe?

-   if (PQstatus(entry->conn) != CONNECTION_OK ||
-   PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
-   entry->changing_xact_state ||
-   entry->invalidated)
...
+   if (PQstatus(entry->conn) != CONNECTION_OK ||
+   PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+   entry->changing_xact_state)

Why did you get rid of the condition "entry->invalidated"?






I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that 
worth applying independent

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-26 Thread Masahiko Sawada

On Sat, Jan 16, 2021 at 1:39 AM Zhihong Yu  wrote:
>
> Hi,

Thank you for reviewing the patch!

> For v32-0004-Add-PrepareForeignTransaction-API.patch :
>
> + * Whenever a foreign transaction is processed, the corresponding FdwXact
> + * entry is update. To avoid holding the lock during transaction 
> processing
> + * which may take an unpredicatable time the in-memory data of foreign
>
> entry is update -> entry is updated
>
> unpredictable -> unpredictable

Fixed.
¨
>
> +   int nlefts = 0;
>
> nlefts -> nremaining
>
> +   elog(DEBUG1, "left %u foreign transactions", nlefts);
>
> The message can be phrased as "%u foreign transactions remaining"

Fixed.

>
> +FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)
>
> Fdw and Xact are repeated. Seems one should suffice. How about naming the 
> method FdwXactResolveTransactions() ?
> Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact)

Agreed. I changed to ResolveFdwXacts() and ResolveOneFdwXact()
respectively to avoid a long function name.

>
> For get_fdwxact():
>
> +   /* This entry matches the condition */
> +   found = true;
> +   break;
>
> Instead of breaking and returning, you can return within the loop directly.

Fixed.

Those changes are incorporated into the latest version patches[1] I
submitted today.

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoBYyA5O%2BFPN4Cs9YWiKjq319BvF5fYmKNsFTZfwTcWjQw%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-26 Thread Fujii Masao





On 2021/01/18 14:54, Masahiko Sawada wrote:

On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu  wrote:


For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder why 
entry->changing_xact_state is set to true, and later being assigned again.


Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:

 /*
  * If connection is before starting transaction or is already 
unsalvageable,
  * do only the cleanup and don't touch it further.
  */
 if (entry->changing_xact_state)
 {
 pgfdw_cleanup_after_transaction(entry);
 return;
 }



For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new


Fixed.



+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be 
dropped from the method name.


But there is already a function named FdwXactExists()?

bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)

As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?



+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.


Fixed.



+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be omitted 
from the method name.


Agreed. How about FdwXactRequestToLaunchResolver()?



+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use 
whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or 
FdwXactResolveShmemInit)


Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.



+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names are 
consistent (in casing).


Fixed.



+errmsg("out of foreign transaction resolver slots"),
+errhint("You might need to increase 
max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers


I agree it would be nice but looking at other code we don't include
the value in this kind of messages.



For fdwxact_resolver_onexit():

+   LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+   fdwxact->locking_backend = InvalidBackendId;
+   LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I wonder if 
the lock can be obtained prior to the for loop and released coming out of the 
for loop.


Agreed.



+FXRslvLoop(void)

Please use Resolver instead of Rslv


Fixed.



+   FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the method 
name can be made shorter.


Fixed.


You fixed some issues. But maybe you forgot to attach the latest patches?

I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that 
worth applying independent from 2PC feature. If there are such changes, IMO we 
can apply them in advance, and which would make the patches simpler.

+   if (PQresultStatus(res) != PGRES_COMMAND_OK)
+   ereport(ERROR, (errmsg("could not commit transaction on server 
%s",
+  
frstate->server->servername)));

You changed the code this way because you want to include the server name in 
the error message? I agree that it's helpful to report also the server name 
that caused an error. OTOH, since this change gets rid of call to 
pgfdw_rerport_error() for the returned PGresult, the reported error message 
contains less information. If this understanding is right, I don't think that 
this change is an improvement.

Instead, if the server name should be included in the error message, 
pgfdw_report_error() should be changed so that it also reports the server name? 
If we do that, the server name is reported not only when COMMIT fails but also 
when

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-18 Thread Zhihong Yu

Hi, Masahiko-san:

bq. How about FdwXactRequestToLaunchResolver()?

Sounds good to me.

bq. But there is already a function named FdwXactExists()

Then we can leave the function name as it is.

Cheers

On Sun, Jan 17, 2021 at 9:55 PM Masahiko Sawada 
wrote:

> On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu  wrote:
> >
> > For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :
> >
> > +   entry->changing_xact_state = true;
> > ...
> > +   entry->changing_xact_state = abort_cleanup_failure;
> >
> > I don't see return statement in between the two assignments. I wonder
> why entry->changing_xact_state is set to true, and later being assigned
> again.
>
> Because postgresRollbackForeignTransaction() can get called again in
> case where an error occurred during aborting and cleanup the
> transaction. For example, if an error occurred when executing ABORT
> TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
> postgresRollbackForeignTransaction() will get called again while
> entry->changing_xact_state is still true. Then the entry will be
> caught by the following condition and cleaned up:
>
> /*
>  * If connection is before starting transaction or is already
> unsalvageable,
>  * do only the cleanup and don't touch it further.
>  */
> if (entry->changing_xact_state)
> {
> pgfdw_cleanup_after_transaction(entry);
> return;
> }
>
> >
> > For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :
> >
> > bq. This commits introduces to new background processes: foreign
> >
> > commits introduces to new -> commit introduces two new
>
> Fixed.
>
> >
> > +FdwXactExistsXid(TransactionId xid)
> >
> > Since Xid is the parameter to this method, I think the Xid suffix can be
> dropped from the method name.
>
> But there is already a function named FdwXactExists()?
>
> bool
> FdwXactExists(Oid dbid, Oid serverid, Oid userid)
>
> As far as I read other code, we already have such functions that have
> the same functionality but have different arguments. For instance,
> SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
> we can leave as it is but is it better to have like
> FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?
>
> >
> > + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
> >
> > Please correct year in the next patch set.
>
> Fixed.
>
> >
> > +FdwXactLauncherRequestToLaunch(void)
> >
> > Since the launcher's job is to 'launch', I think the Launcher can be
> omitted from the method name.
>
> Agreed. How about FdwXactRequestToLaunchResolver()?
>
> >
> > +/* Report shared memory space needed by FdwXactRsoverShmemInit */
> > +Size
> > +FdwXactRslvShmemSize(void)
> >
> > Are both Rsover and Rslv referring to resolver ? It would be better to
> use whole word which reduces confusion.
> > Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or
> FdwXactResolveShmemInit)
>
> Agreed. I realized that these functions are the launcher's function,
> not resolver's. So I'd change to FdwXactLauncherShmemSize() and
> FdwXactLauncherShmemInit() respectively.
>
> >
> > +fdwxact_launch_resolver(Oid dbid)
> >
> > The above method is not in camel case. It would be better if method
> names are consistent (in casing).
>
> Fixed.
>
> >
> > +errmsg("out of foreign transaction resolver slots"),
> > +errhint("You might need to increase
> max_foreign_transaction_resolvers.")));
> >
> > It would be nice to include the value of max_foreign_xact_resolvers
>
> I agree it would be nice but looking at other code we don't include
> the value in this kind of messages.
>
> >
> > For fdwxact_resolver_onexit():
> >
> > +   LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
> > +   fdwxact->locking_backend = InvalidBackendId;
> > +   LWLockRelease(FdwXactLock);
> >
> > There is no call to method inside the for loop which may take time. I
> wonder if the lock can be obtained prior to the for loop and released
> coming out of the for loop.
>
> Agreed.
>
> >
> > +FXRslvLoop(void)
> >
> > Please use Resolver instead of Rslv
>
> Fixed.
>
> >
> > +   FdwXactResolveFdwXacts(held_fdwxacts, nheld);
> >
> > Fdw and Xact are repeated twice each in the method name. Probably the
> method name can be made shorter.
>
> Fixed.
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-17 Thread Masahiko Sawada

On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu  wrote:
>
> For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :
>
> +   entry->changing_xact_state = true;
> ...
> +   entry->changing_xact_state = abort_cleanup_failure;
>
> I don't see return statement in between the two assignments. I wonder why 
> entry->changing_xact_state is set to true, and later being assigned again.

Because postgresRollbackForeignTransaction() can get called again in
case where an error occurred during aborting and cleanup the
transaction. For example, if an error occurred when executing ABORT
TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR),
postgresRollbackForeignTransaction() will get called again while
entry->changing_xact_state is still true. Then the entry will be
caught by the following condition and cleaned up:

/*
 * If connection is before starting transaction or is already unsalvageable,
 * do only the cleanup and don't touch it further.
 */
if (entry->changing_xact_state)
{
pgfdw_cleanup_after_transaction(entry);
return;
}

>
> For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :
>
> bq. This commits introduces to new background processes: foreign
>
> commits introduces to new -> commit introduces two new

Fixed.

>
> +FdwXactExistsXid(TransactionId xid)
>
> Since Xid is the parameter to this method, I think the Xid suffix can be 
> dropped from the method name.

But there is already a function named FdwXactExists()?

bool
FdwXactExists(Oid dbid, Oid serverid, Oid userid)

As far as I read other code, we already have such functions that have
the same functionality but have different arguments. For instance,
SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think
we can leave as it is but is it better to have like
FdwXactCheckExistence() and FdwXactCheckExistenceByXid()?

>
> + * Portions Copyright (c) 2020, PostgreSQL Global Development Group
>
> Please correct year in the next patch set.

Fixed.

>
> +FdwXactLauncherRequestToLaunch(void)
>
> Since the launcher's job is to 'launch', I think the Launcher can be omitted 
> from the method name.

Agreed. How about FdwXactRequestToLaunchResolver()?

>
> +/* Report shared memory space needed by FdwXactRsoverShmemInit */
> +Size
> +FdwXactRslvShmemSize(void)
>
> Are both Rsover and Rslv referring to resolver ? It would be better to use 
> whole word which reduces confusion.
> Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or 
> FdwXactResolveShmemInit)

Agreed. I realized that these functions are the launcher's function,
not resolver's. So I'd change to FdwXactLauncherShmemSize() and
FdwXactLauncherShmemInit() respectively.

>
> +fdwxact_launch_resolver(Oid dbid)
>
> The above method is not in camel case. It would be better if method names are 
> consistent (in casing).

Fixed.

>
> +errmsg("out of foreign transaction resolver slots"),
> +errhint("You might need to increase 
> max_foreign_transaction_resolvers.")));
>
> It would be nice to include the value of max_foreign_xact_resolvers

I agree it would be nice but looking at other code we don't include
the value in this kind of messages.

>
> For fdwxact_resolver_onexit():
>
> +   LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
> +   fdwxact->locking_backend = InvalidBackendId;
> +   LWLockRelease(FdwXactLock);
>
> There is no call to method inside the for loop which may take time. I wonder 
> if the lock can be obtained prior to the for loop and released coming out of 
> the for loop.

Agreed.

>
> +FXRslvLoop(void)
>
> Please use Resolver instead of Rslv

Fixed.

>
> +   FdwXactResolveFdwXacts(held_fdwxacts, nheld);
>
> Fdw and Xact are repeated twice each in the method name. Probably the method 
> name can be made shorter.

Fixed.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-15 Thread Zhihong Yu

Hi,
For v32-0004-Add-PrepareForeignTransaction-API.patch :

+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update. To avoid holding the lock during transaction
processing
+ * which may take an unpredicatable time the in-memory data of foreign

entry is update -> entry is updated

unpredictable -> unpredictable

+   int nlefts = 0;

nlefts -> nremaining

+   elog(DEBUG1, "left %u foreign transactions", nlefts);

The message can be phrased as "%u foreign transactions remaining"

+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)

Fdw and Xact are repeated. Seems one should suffice. How about naming the
method FdwXactResolveTransactions() ?
Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact)

For get_fdwxact():

+   /* This entry matches the condition */
+   found = true;
+   break;

Instead of breaking and returning, you can return within the loop directly.

Cheers

On Thu, Jan 14, 2021 at 9:17 PM Masahiko Sawada 
wrote:

> On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu  wrote:
> >
> > Hi,
> > For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :
>
> Thank you for reviewing the patch!
>
> >
> > +   boolhave_notwophase = false;
> >
> > Maybe name the variable have_no_twophase so that it is easier to read.
>
> Fixed.
>
> >
> > +* Two-phase commit is not required if the number of servers
> performed
> >
> > performed -> performing
>
> Fixed.
>
> >
> > +errmsg("cannot process a distributed transaction that
> has operated on a foreign server that does not support two-phase commit
> protocol"),
> > +errdetail("foreign_twophase_commit is \'required\' but
> the transaction has some foreign servers which are not capable of two-phase
> commit")));
> >
> > The lines are really long. Please wrap into more lines.
>
> Hmm, we can do that but if we do that, it makes grepping by the error
> message hard. Please refer to the documentation about the formatting
> guideline[1]:
>
> Limit line lengths so that the code is readable in an 80-column
> window. (This doesn't mean that you must never go past 80 columns. For
> instance, breaking a long error message string in arbitrary places
> just to keep the code within 80 columns is probably not a net gain in
> readability.)
>
> These changes have been made in the local branch. I'll post the
> updated patch set after incorporating all the comments.
>
> Regards,
>
> [1] https://www.postgresql.org/docs/devel/source-format.html
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-14 Thread Masahiko Sawada

On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu  wrote:
>
> Hi,
> For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

Thank you for reviewing the patch!

>
> +   boolhave_notwophase = false;
>
> Maybe name the variable have_no_twophase so that it is easier to read.

Fixed.

>
> +* Two-phase commit is not required if the number of servers performed
>
> performed -> performing

Fixed.

>
> +errmsg("cannot process a distributed transaction that has 
> operated on a foreign server that does not support two-phase commit 
> protocol"),
> +errdetail("foreign_twophase_commit is \'required\' but the 
> transaction has some foreign servers which are not capable of two-phase 
> commit")));
>
> The lines are really long. Please wrap into more lines.

Hmm, we can do that but if we do that, it makes grepping by the error
message hard. Please refer to the documentation about the formatting
guideline[1]:

Limit line lengths so that the code is readable in an 80-column
window. (This doesn't mean that you must never go past 80 columns. For
instance, breaking a long error message string in arbitrary places
just to keep the code within 80 columns is probably not a net gain in
readability.)

These changes have been made in the local branch. I'll post the
updated patch set after incorporating all the comments.

Regards,

[1] https://www.postgresql.org/docs/devel/source-format.html

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-14 Thread Zhihong Yu

For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder
why entry->changing_xact_state is set to true, and later being assigned
again.

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be
dropped from the method name.

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be
omitted from the method name.

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use
whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or
FdwXactResolveShmemInit)

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names
are consistent (in casing).

+errmsg("out of foreign transaction resolver slots"),
+errhint("You might need to increase
max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

For fdwxact_resolver_onexit():

+   LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+   fdwxact->locking_backend = InvalidBackendId;
+   LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I
wonder if the lock can be obtained prior to the for loop and released
coming out of the for loop.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

+   FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the
method name can be made shorter.

Cheers

On Thu, Jan 14, 2021 at 11:04 AM Zhihong Yu  wrote:

> Hi,
> For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :
>
> +   boolhave_notwophase = false;
>
> Maybe name the variable have_no_twophase so that it is easier to read.
>
> +* Two-phase commit is not required if the number of servers performed
>
> performed -> performing
>
> +errmsg("cannot process a distributed transaction that has
> operated on a foreign server that does not support two-phase commit
> protocol"),
> +errdetail("foreign_twophase_commit is \'required\' but
> the transaction has some foreign servers which are not capable of two-phase
> commit")));
>
> The lines are really long. Please wrap into more lines.
>
>
>
> On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada 
> wrote:
>
>> On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu  wrote:
>> >
>> > Hi,
>>
>> Thank you for reviewing the patch!
>>
>> > For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
>> >
>> > However these functions are not neither committed nor aborted at
>> >
>> > I think the double negation was not intentional. Should be 'are neither
>> ...'
>>
>> Fixed.
>>
>> >
>> > For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
>> return statement ?
>>
>> Hmm, you mean that we need MAXALIGN(size) after adding the size of
>> FdwXactData structs?
>>
>> Size
>> FdwXactShmemSize(void)
>> {
>> Sizesize;
>>
>> /* Size for foreign transaction information array */
>> size = offsetof(FdwXactCtlData, fdwxacts);
>> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>>sizeof(FdwXact)));
>> size = MAXALIGN(size);
>> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>>sizeof(FdwXactData)));
>>
>> return size;
>> }
>>
>> I don't think we need to do that. Looking at other similar code such
>> as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>>
>> >
>> > +   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
>> >
>> > For the function name, Fdw and Xact appear twice, each. Maybe one of
>> them can be dropped ?
>>
>> Agreed. Changed to FdwXactInsertEntry().
>>
>> >
>> > +* we don't need to anything for this participant because all
>> foreign
>> >
>> > 'need to' -> 'need to do'
>>
>> Fixed.
>>
>> >
>> > +   else if (TransactionIdDidAbort(xid))
>> > +   return FDWXACT_STATUS_ABORTING;
>> > +
>> > the 'else' can be omitted since the preceding if would return.
>>
>> Fixed.
>>
>> >
>> > +   if (max_prepared_foreign_xacts <= 0)
>> >
>> > I wonder when the value for max_prepared_foreign_xacts would be
>> negative (and whether that should be considered an error).
>> >
>>
>> Fixed to (max_prepared_foreign_xacts == 0)
>>
>> Attached the updated version patch

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-14 Thread Zhihong Yu

Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

+   boolhave_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

+* Two-phase commit is not required if the number of servers performed

performed -> performing

+errmsg("cannot process a distributed transaction that has
operated on a foreign server that does not support two-phase commit
protocol"),
+errdetail("foreign_twophase_commit is \'required\' but the
transaction has some foreign servers which are not capable of two-phase
commit")));

The lines are really long. Please wrap into more lines.



On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada 
wrote:

> On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu  wrote:
> >
> > Hi,
>
> Thank you for reviewing the patch!
>
> > For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
> >
> > However these functions are not neither committed nor aborted at
> >
> > I think the double negation was not intentional. Should be 'are neither
> ...'
>
> Fixed.
>
> >
> > For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
> return statement ?
>
> Hmm, you mean that we need MAXALIGN(size) after adding the size of
> FdwXactData structs?
>
> Size
> FdwXactShmemSize(void)
> {
> Sizesize;
>
> /* Size for foreign transaction information array */
> size = offsetof(FdwXactCtlData, fdwxacts);
> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>sizeof(FdwXact)));
> size = MAXALIGN(size);
> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>sizeof(FdwXactData)));
>
> return size;
> }
>
> I don't think we need to do that. Looking at other similar code such
> as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>
> >
> > +   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
> >
> > For the function name, Fdw and Xact appear twice, each. Maybe one of
> them can be dropped ?
>
> Agreed. Changed to FdwXactInsertEntry().
>
> >
> > +* we don't need to anything for this participant because all
> foreign
> >
> > 'need to' -> 'need to do'
>
> Fixed.
>
> >
> > +   else if (TransactionIdDidAbort(xid))
> > +   return FDWXACT_STATUS_ABORTING;
> > +
> > the 'else' can be omitted since the preceding if would return.
>
> Fixed.
>
> >
> > +   if (max_prepared_foreign_xacts <= 0)
> >
> > I wonder when the value for max_prepared_foreign_xacts would be negative
> (and whether that should be considered an error).
> >
>
> Fixed to (max_prepared_foreign_xacts == 0)
>
> Attached the updated version patch set.
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-06 Thread Zhihong Yu

Hi,
For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither ...'

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
return statement ?

+   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of them
can be dropped ?

+* we don't need to anything for this participant because all
foreign

'need to' -> 'need to do'

+   else if (TransactionIdDidAbort(xid))
+   return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

+   if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative
(and whether that should be considered an error).

Cheers

On Wed, Jan 6, 2021 at 5:45 PM Masahiko Sawada 
wrote:

> On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada 
> wrote:
> >
> > On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada 
> wrote:
> > >
> > > Since the previous version conflicts with the current HEAD I've
> > > attached the rebased version patch set.
> >
> > Rebased the patch set again to the current HEAD.
> >
> > The discussion of this patch is very long so here is a short summary
> > of the current state:
> >
> > It’s still under discussion which approaches are the best for the
> > distributed transaction commit as a building block of built-in sharing
> > using foreign data wrappers.
> >
> > Since we’re considering that we use this feature for built-in
> > sharding, the design depends on the architecture of built-in sharding.
> > For example, with the current patch, the PostgreSQL node that received
> > a COMMIT from the client works as a coordinator and it commits the
> > transactions using 2PC on all foreign servers involved with the
> > transaction. This approach would be good with the de-centralized
> > sharding architecture but not with centralized architecture like the
> > GTM node of Postgres-XC and Postgres-XL that is a dedicated component
> > that is responsible for transaction management. Since we don't get a
> > consensus on the built-in sharding architecture yet, it's still an
> > open question that this patch's approach is really good as a building
> > block of the built-in sharding.
> >
> > On the other hand, this feature is not necessarily dedicated to the
> > built-in sharding. For example, the distributed transaction commit
> > through FDW is important also when atomically moving data between two
> > servers via FDWs. Using a dedicated process or server like GTM could
> > be an over solution. Having the node that received a COMMIT work as a
> > coordinator would be better and straight forward.
> >
> > There is no noticeable TODO in the functionality so far covered by
> > this patch set. This patchset adds new FDW APIs to support 2PC,
> > introduces the global transaction manager, and implement those FDW
> > APIs to postgres_fdw. Also, it has regression tests and documentation.
> > Transactions on foreign servers involved with the distributed
> > transaction are committed using 2PC. Committing using 2PC is performed
> > asynchronously and transparently to the user. Therefore, it doesn’t
> > guarantee that transactions on the foreign server are also committed
> > when the client gets an acknowledgment of COMMIT. The patch doesn't
> > cover synchronous foreign transaction commit via 2PC is not covered by
> > this patch as we still need a discussion on the design.
> >
>
> I've attached the rebased patches to make cfbot happy.
>
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-21 Thread Masahiko Sawada

On Wed, 21 Oct 2020 at 18:33, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > So what's your opinion?
>
> My opinion is simple and has not changed.  Let's clarify and refine the 
> design first in the following areas (others may have pointed out something 
> else too, but I don't remember), before going deeper into the code review.
>
> * FDW interface
> New functions so that other FDWs can really implement.  Currently, XA seems 
> to be the only model we can rely on to validate the FDW interface.
> What FDW function would call what XA function(s)?  What should be the 
> arguments for the FEW functions?

I guess since FDW interfaces may be affected by the feature
architecture we can discuss later.

> * Performance
> Parallel prepare and commits on the client backend.  The current 
> implementation is untolerable and should not be the first release quality.  I 
> proposed the idea.
> (If you insist you don't want to anything about this, I have to think you're 
> just rushing for the patch commit.  I want to keep Postgres's reputation.)

What is in your mind regarding the implementation of parallel prepare
and commit? Given that some FDW plugins don't support asynchronous
execution I guess we need to use parallel workers or something. That
is, the backend process launches parallel workers to
prepare/commit/rollback foreign transactions in parallel. I don't deny
this approach but it'll definitely make the feature complex and needs
more codes.

My point is a small start and keeping simple the first version. Even
if we need one or more years for this feature, I think that
introducing the simple and minimum functionality as the first version
to the core still has benefits. We will be able to have the
opportunity to get real feedback from users and to fix bugs in the
main infrastructure before making it complex. In this sense, the patch
having the backend return without waits for resolution after the local
commit would be a good start as the first version (i.g., up to
applying v26-0006 patch). Anyway, the architecture should be
extensible enough for future improvements.

For the performance improvements, we will be able to support
asynchronous and/or prepare/commit/rollback. Moreover, having multiple
resolver processes on one database would also help get better
through-put. For the user who needs much better through-put, the user
also can select not to wait for resolution after the local commit,
like synchronous_commit = ‘local’ in replication.

> As part of this, I'd like to see the 2PC's message flow and disk writes (via 
> email and/or on the following wiki.)  That helps evaluate the 2PC 
> performance, because it's hard to figure it out in the code of a large patch 
> set.  I'm simply imagining what is typically written in database textbooks 
> and research papers.  I'm asking this because I saw some discussion in this 
> thread that some new WAL records are added.  I was worried that transactions 
> have to write WAL records other than prepare and commit unlike textbook 
> implementations.
>
> Atomic Commit of Distributed Transactions
> https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Understood. I'll add an explanation about the message flow and disk
writes to the wiki page.

We need to consider the point of error handling during resolving
foreign transactions too.

>
> > I don’t think we need to stipulate the query cancellation. Anyway I
> > guess the facts neither that we don’t stipulate anything about query
> > cancellation now nor that postgres_fdw might not be cancellable in
> > some situations now are not a reason for not supporting query
> > cancellation. If it's a desirable behavior and users want it, we need
> > to put an effort to support it as much as possible like we’ve done in
> > postgres_fdw.  Some FDWs unfortunately might not be able to support it
> > only by their functionality but it would be good if we can achieve
> > that by combination of PostgreSQL and FDW plugins.
>
> Let me comment on this a bit; this is a bit dangerous idea, I'm afraid.  We 
> need to pay attention to the FDW interface and its documentation so that FDW 
> developers can implement what we consider important -- query cancellation in 
> your discussion.  "postgres_fdw is OK, so the interface is good" can create 
> interfaces that other FDW developers can't use.  That's what Tomas Vondra 
> pointed out several years ago.

I suspect the story is somewhat different. libpq fortunately supports
asynchronous execution, but when it comes to canceling the foreign
transaction resolution I think basically all FDW plugins are in the
same situation at this time. We can choose whether to make it
cancellable or not. According to the discussion so far, it completely
depends on the architecture of this feature. So my point is whether
it's worth to have this functionality for users and whether users want
it, not whether postgres_fdw is ok.

Regards,

-- 
Masahiko Sawada

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-21 Thread Amit Kapila

On Wed, Oct 21, 2020 at 3:03 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > So what's your opinion?
>
> * Global visibility
> This is what Amit-san suggested some times -- "design it before reviewing the 
> current patch."  I'm a bit optimistic about this and think this FDW 2PC can 
> be implemented separately as a pure enhancement of FDW.  But I also 
> understand his concern.  If your (our?) aim is to use this FDW 2PC for 
> sharding,
>

As far as I understand that is what the goal is for which this is a
step. For example, see the wiki [1]. I understand that wiki is not the
final thing but I have seen other places as well where there is a
mention of FDW based sharding and I feel this is the reason why many
people are trying to improve this area. That is why I suggested having
an upfront design of global visibility and a deadlock detector along
with this work.

[1] - https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding

-- 
With Regards,
Amit Kapila.

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-21 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> So what's your opinion?

My opinion is simple and has not changed.  Let's clarify and refine the design 
first in the following areas (others may have pointed out something else too, 
but I don't remember), before going deeper into the code review.

* FDW interface
New functions so that other FDWs can really implement.  Currently, XA seems to 
be the only model we can rely on to validate the FDW interface.
What FDW function would call what XA function(s)?  What should be the arguments 
for the FEW functions?

* Performance
Parallel prepare and commits on the client backend.  The current implementation 
is untolerable and should not be the first release quality.  I proposed the 
idea.
(If you insist you don't want to anything about this, I have to think you're 
just rushing for the patch commit.  I want to keep Postgres's reputation.)
As part of this, I'd like to see the 2PC's message flow and disk writes (via 
email and/or on the following wiki.)  That helps evaluate the 2PC performance, 
because it's hard to figure it out in the code of a large patch set.  I'm 
simply imagining what is typically written in database textbooks and research 
papers.  I'm asking this because I saw some discussion in this thread that some 
new WAL records are added.  I was worried that transactions have to write WAL 
records other than prepare and commit unlike textbook implementations.

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

* Query cancellation
As you showed, there's no problem with postgres_fdw?
The cancelability of FDW in general remains a problem, but that can be a 
separate undertaking.

* Global visibility
This is what Amit-san suggested some times -- "design it before reviewing the 
current patch."  I'm a bit optimistic about this and think this FDW 2PC can be 
implemented separately as a pure enhancement of FDW.  But I also understand his 
concern.  If your (our?) aim is to use this FDW 2PC for sharding, we may have 
to design the combination of 2PC and visibility first.



> I don’t think we need to stipulate the query cancellation. Anyway I
> guess the facts neither that we don’t stipulate anything about query
> cancellation now nor that postgres_fdw might not be cancellable in
> some situations now are not a reason for not supporting query
> cancellation. If it's a desirable behavior and users want it, we need
> to put an effort to support it as much as possible like we’ve done in
> postgres_fdw.  Some FDWs unfortunately might not be able to support it
> only by their functionality but it would be good if we can achieve
> that by combination of PostgreSQL and FDW plugins.

Let me comment on this a bit; this is a bit dangerous idea, I'm afraid.  We 
need to pay attention to the FDW interface and its documentation so that FDW 
developers can implement what we consider important -- query cancellation in 
your discussion.  "postgres_fdw is OK, so the interface is good" can create 
interfaces that other FDW developers can't use.  That's what Tomas Vondra 
pointed out several years ago.


Regards
Takayuki Tsunakawa

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-21 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> > if (PQstatus(entry->conn) != CONNECTION_OK ||
> > PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> > entry->changing_xact_state)
> > {
> > elog(DEBUG3, "discarding connection %p", entry->conn);
> > disconnect_pg_server(entry);
> > }
> 
> Right.  Although it's not directly relevant to this discussion,
> precisely, that part is not visited just after the remote "COMMIT
> TRANSACTION" failed. If that commit fails or is canceled, an exception
> is raised while entry->changing_xact_state = true. Then the function
> is called again within AbortCurrentTransaction() and reaches the above
> code.

Ah, then the connection to the foreign server is closed after failing to cancel 
the query.  Thanks.


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-21 Thread Kyotaro Horiguchi

At Tue, 20 Oct 2020 21:22:31 +0900, Masahiko Sawada 
 wrote in 
> On Tue, 20 Oct 2020 at 17:56, tsunakawa.ta...@fujitsu.com
>  wrote:
> >
> > From: Kyotaro Horiguchi 
> > > It seems to respond to a statement-cancel signal immediately while
> > > waiting for a coming byte.  However, seems to wait forever while
> > > waiting a space in send-buffer. (Is that mean the session will be
> > > stuck if it sends a large chunk of bytes while the network is down?)
> >
> > What part makes you worried about that?  libpq's send processing?
> >
> > I've just examined pgfdw_cancel_query(), too.  As below, it uses a hidden 
> > 30 second timeout.  After all, postgres_fdw also relies on timeout already.
> 
> It uses the timeout but it's also cancellable before the timeout. See
> we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result().

Yes. And as Sawada-san mentioned it's not a matter if a specific FDW
module accepts cancellation or not. It's sufficient that we have one
example. Other FDWs will follow postgres_fdw if needed.

> > > After receiving a signal, it closes the problem connection. So the
> > > local session is usable after that but the fiailed remote sessions are
> > > closed and created another one at the next use.
> >
> > I couldn't see that the problematic connection is closed when the 
> > cancellation fails... Am I looking at a wrong place?
...
> 
> I guess Horiguchi-san refereed the following code in pgfdw_xact_callback():
> 
> /*
>  * If the connection isn't in a good idle state, discard it to
>  * recover. Next GetConnection will open a new connection.
>  */
> if (PQstatus(entry->conn) != CONNECTION_OK ||
> PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> entry->changing_xact_state)
> {
> elog(DEBUG3, "discarding connection %p", entry->conn);
> disconnect_pg_server(entry);
> }

Right.  Although it's not directly relevant to this discussion,
precisely, that part is not visited just after the remote "COMMIT
TRANSACTION" failed. If that commit fails or is canceled, an exception
is raised while entry->changing_xact_state = true. Then the function
is called again within AbortCurrentTransaction() and reaches the above
code.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada

On Tue, 20 Oct 2020 at 17:56, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > It seems to respond to a statement-cancel signal immediately while
> > waiting for a coming byte.  However, seems to wait forever while
> > waiting a space in send-buffer. (Is that mean the session will be
> > stuck if it sends a large chunk of bytes while the network is down?)
>
> What part makes you worried about that?  libpq's send processing?
>
> I've just examined pgfdw_cancel_query(), too.  As below, it uses a hidden 30 
> second timeout.  After all, postgres_fdw also relies on timeout already.

It uses the timeout but it's also cancellable before the timeout. See
we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result().

>
>
> > After receiving a signal, it closes the problem connection. So the
> > local session is usable after that but the fiailed remote sessions are
> > closed and created another one at the next use.
>
> I couldn't see that the problematic connection is closed when the 
> cancellation fails... Am I looking at a wrong place?
>
> /*
>  * If connection is already unsalvageable, don't touch it
>  * further.
>  */
> if (entry->changing_xact_state)
> break;
>

I guess Horiguchi-san refereed the following code in pgfdw_xact_callback():

/*
 * If the connection isn't in a good idle state, discard it to
 * recover. Next GetConnection will open a new connection.
 */
if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada

On Tue, 20 Oct 2020 at 16:54, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
> >  wrote in
> > > I think it doesn't matter whether in FDW framework or not. The user
> > > normally doesn't care which backend processes connecting to foreign
> > > servers. They will attempt to cancel the query like always if they
> > > realized that a backend gets stuck. There are surely plenty of users
> > > who use query cancellation.
> >
> > The most serious impact from inability of canceling a query on a
> > certain session is that server-restart is required to end such a
> > session.
>
> OK, as I may be repeating, I didn't deny the need for cancellation.

So what's your opinion?

> Let''s organize the argument.
>
> * FDW in general
> My understanding is that the FDW feature does not stipulate anything about 
> cancellation.  In fact, odbc_fdw was uncancelable.  What do we do about this?
>
> * postgres_fdw
> Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. 
>  So no problem.
> But is it really cancellable now?  What if the libpq call is waiting for 
> response when the foreign server or network is down?

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw.  Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> It seems to respond to a statement-cancel signal immediately while
> waiting for a coming byte.  However, seems to wait forever while
> waiting a space in send-buffer. (Is that mean the session will be
> stuck if it sends a large chunk of bytes while the network is down?)

What part makes you worried about that?  libpq's send processing?

I've just examined pgfdw_cancel_query(), too.  As below, it uses a hidden 30 
second timeout.  After all, postgres_fdw also relies on timeout already.

/*
 * If it takes too long to cancel the query and discard the result, assume
 * the connection is dead.
 */
endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), 3);


> After receiving a signal, it closes the problem connection. So the
> local session is usable after that but the fiailed remote sessions are
> closed and created another one at the next use.

I couldn't see that the problematic connection is closed when the cancellation 
fails... Am I looking at a wrong place?

/*
 * If connection is already unsalvageable, don't touch it
 * further.
 */
if (entry->changing_xact_state)
break;


Regards
Takayuki Tsunakawa

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
>  wrote in
> > I think it doesn't matter whether in FDW framework or not. The user
> > normally doesn't care which backend processes connecting to foreign
> > servers. They will attempt to cancel the query like always if they
> > realized that a backend gets stuck. There are surely plenty of users
> > who use query cancellation.
> 
> The most serious impact from inability of canceling a query on a
> certain session is that server-restart is required to end such a
> session.

OK, as I may be repeating, I didn't deny the need for cancellation.  Let''s 
organize the argument.

* FDW in general
My understanding is that the FDW feature does not stipulate anything about 
cancellation.  In fact, odbc_fdw was uncancelable.  What do we do about this?

* postgres_fdw
Fortunately, it is (should be?) cancelable whatever method we choose for 2PC.  
So no problem.
But is it really cancellable now?  What if the libpq call is waiting for 
response when the foreign server or network is down?

"Inability to cancel requires database server restart" feels a bit 
exaggerating, as libpq has tcp_keepalive* and tcp_user_timeout connection 
parameters, and even without setting them, TCP timeout works.


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Kyotaro Horiguchi

At Tue, 20 Oct 2020 04:23:12 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Kyotaro Horiguchi 
> > > Furthermore, FDW is not cancellable in general.  So, I don't see a point 
> > > in
> > trying hard to make only commit be cancelable.
> > 
> > I think that it is quite important that operators can cancel any
> > process that has been stuck for a long time. Furthermore, postgres_fdw
> > is more likely to be stuck since network is involved so the usefulness
> > of that feature would be higher.
> 
> But lower than practical performance during normal operation.
> 
> BTW, speaking of network, how can postgres_fdw respond quickly to cancel 
> request when libpq is waiting for a reply from a down foreign server?  Can 
> the user continue to use that session after cancellation?

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte.  However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Kyotaro Horiguchi

At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada 
 wrote in 
> On Tue, 20 Oct 2020 at 13:23, tsunakawa.ta...@fujitsu.com
>  wrote:
> >
> > From: Kyotaro Horiguchi 
> > > I don't think the inability to cancel all session at once cannot be a
> > > reason not to not to allow operators to cancel a stuck session.
> >
> > Yeah, I didn't mean to discount the ability to cancel queries.  I just want 
> > to confirm how the user can use the cancellation in practice.  I didn't see 
> > how the user can use the cancellation in the FDW framework, so I asked 
> > about it.  We have to think about the user's context if we regard canceling 
> > commits as important.
> >
> 
> I think it doesn't matter whether in FDW framework or not. The user
> normally doesn't care which backend processes connecting to foreign
> servers. They will attempt to cancel the query like always if they
> realized that a backend gets stuck. There are surely plenty of users
> who use query cancellation.

The most serious impact from inability of canceling a query on a
certain session is that server-restart is required to end such a
session.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada

On Tue, 20 Oct 2020 at 13:23, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > I don't think the inability to cancel all session at once cannot be a
> > reason not to not to allow operators to cancel a stuck session.
>
> Yeah, I didn't mean to discount the ability to cancel queries.  I just want 
> to confirm how the user can use the cancellation in practice.  I didn't see 
> how the user can use the cancellation in the FDW framework, so I asked about 
> it.  We have to think about the user's context if we regard canceling commits 
> as important.
>

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-19 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> I don't think the inability to cancel all session at once cannot be a
> reason not to not to allow operators to cancel a stuck session.

Yeah, I didn't mean to discount the ability to cancel queries.  I just want to 
confirm how the user can use the cancellation in practice.  I didn't see how 
the user can use the cancellation in the FDW framework, so I asked about it.  
We have to think about the user's context if we regard canceling commits as 
important.


> > Furthermore, FDW is not cancellable in general.  So, I don't see a point in
> trying hard to make only commit be cancelable.
> 
> I think that it is quite important that operators can cancel any
> process that has been stuck for a long time. Furthermore, postgres_fdw
> is more likely to be stuck since network is involved so the usefulness
> of that feature would be higher.

But lower than practical performance during normal operation.

BTW, speaking of network, how can postgres_fdw respond quickly to cancel 
request when libpq is waiting for a reply from a down foreign server?  Can the 
user continue to use that session after cancellation?


Regards
Takayuki Tsunakawa

Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-19 Thread Kyotaro Horiguchi

At Tue, 20 Oct 2020 02:44:09 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Ashutosh Bapat 
> > Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
> > running query from any backend or terminate a backend. For either to
> > work the backend needs to be interruptible. IIRC, Robert had made an
> > effort to make postgres_fdw interruptible few years back.
> 
> Yeah, I know those functions.  Sawada-san was talking about Ctrl + C, so I 
> responded accordingly.
> 
> Also, how can the DBA find sessions to run those functions against?  Can he 
> tell if a session is connected to or running SQL to a given foreign server?  
> Can he terminate or cancel all session with one SQL command that are stuck in 
> accessing a particular foreign server?

I don't think the inability to cancel all session at once cannot be a
reason not to not to allow operators to cancel a stuck session.

> Furthermore, FDW is not cancellable in general.  So, I don't see a point in 
> trying hard to make only commit be cancelable.

I think that it is quite important that operators can cancel any
process that has been stuck for a long time. Furthermore, postgres_fdw
is more likely to be stuck since network is involved so the usefulness
of that feature would be higher.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-19 Thread tsunakawa.ta...@fujitsu.com

From: Ashutosh Bapat 
> Using pg_cancel_backend() and pg_terminate_backend() a DBA can cancel
> running query from any backend or terminate a backend. For either to
> work the backend needs to be interruptible. IIRC, Robert had made an
> effort to make postgres_fdw interruptible few years back.

Yeah, I know those functions.  Sawada-san was talking about Ctrl + C, so I 
responded accordingly.

Also, how can the DBA find sessions to run those functions against?  Can he 
tell if a session is connected to or running SQL to a given foreign server?  
Can he terminate or cancel all session with one SQL command that are stuck in 
accessing a particular foreign server?

Furthermore, FDW is not cancellable in general.  So, I don't see a point in 
trying hard to make only commit be cancelable.


Regards
Takayuki Tsunakawa

1 2 3 >

1 - 100 of 262 matches

Mail list logo