RE: Multi-Master Logical Replication

2022-06-14 Thread r.takahash...@fujitsu.com
Hi Kuroda san,


> I think even if LRG is implemented as contrib modules or any extensions,
> it will deeply depend on the subscription option "origin" proposed in [1].
> So LRG cannot be used for older version, only PG16 or later.

Sorry, I misunderstood.
I understand now.

Regards,
Ryohei Takahashi


RE: Multi-Master Logical Replication

2022-06-13 Thread r.takahash...@fujitsu.com
Hi,


In addition to the use cases mentioned above, some users want to use n-way
replication of partial database.

The following is the typical use case.

* There are several data centers.
  (ex. Japan and India)
* The database in each data center has its unique data.
  (ex. the database in Japan has the data related to Japan)
* There are some common data.
  (ex. the shipment data from Japan to India should be stored on both database)
* To replicate common data, users want to use n-way replication.


The current POC patch seems to support only n-way replication of entire 
database, 
but I think we should support n-way replication of partial database to achieve
above use case.


> I don't know if it requires the kind of code you are thinking but I
> agree that it is worth considering implementing it as an extension.

I think the other advantage to implement as an extension is that users could
install the extension to older Postgres.

As mentioned in previous email, the one use case of n-way replication is 
migration
from older Postgres to newer Postgres.

If we implement as an extension, users could use n-way replication for migration
from PG10 to PG16.


Regards,
Ryohei Takahashi


RE: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-14 Thread r.takahash...@fujitsu.com
Hi Fujii san,


Thank you for updating the patch.
I have no additional comments.

Regards,
Ryohei Takahashi




RE: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-08 Thread r.takahash...@fujitsu.com
Hi,


Thank you for updating the patch.
I agree with the documentation and program.

How about adding the test for %c (Session ID)?
(Adding the test for %C (cluster_name) seems difficult.)


Regards,
Ryohei Takahashi




RE: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-01-27 Thread r.takahash...@fujitsu.com
Hi,


Thank you for developing this feature.
I think adding escape sequence for cluster_name is useful too.

> Is the reason for 'C' in upper-case to avoid possible conflict with
> 'c' of log_line_prefix?  I'm not sure that preventive measure is worth
> doing.  Looking the escape-sequence spec alone, it seems to me rather
> strange that an upper-case letter is used in spite of its lower-case
> is not used yet.

I think %c of log_line_prefix (Session ID) is also useful for 
postgres_fdw.application_name.
Therefore, how about adding both %c (Session ID) and %C (cluster_name)?


Regards,
Ryohei Takahashi




RE: Implementing Incremental View Maintenance

2021-11-23 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


> Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as 
> _readIntoClause()
> and _outIntoClause().

OK.

> > ivm=# create table t (c1 int, c2 int);
> > CREATE TABLE
> > ivm=# create incremental materialized view ivm_t as select distinct c1 from 
> > t;
> > NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
> > SELECT 0
> >
> > Then I executed pg_dump.
> >
> > In the dump, the following SQLs appear.
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
> >  SELECT DISTINCT t.c1
> >FROM public.t
> >   WITH NO DATA;
> >
> > ALTER TABLE ONLY public.ivm_t
> > ADD CONSTRAINT ivm_t_index UNIQUE (c1);
> >
> > If I execute psql with the result of pg_dump, following error occurs.
> >
> > ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation
> "ivm_t"
> > DETAIL:  This operation is not supported for materialized views.
> 
> Good catch! It was my mistake creating unique constraints on IMMV in spite of
> we cannot defined them via SQL. I'll fix it to use unique indexes instead of
> constraints.

I checked the same procedure on v24 patch.
But following error occurs instead of the original error.

ERROR:  relation "ivm_t_index" already exists

Regards,
Ryohei Takahashi




RE: Implementing Incremental View Maintenance

2021-11-23 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


Sorry for late reply.


> However, even if we create triggers recursively on the parents or children, 
> we would still
> need more consideration. This is because we will have to convert the format 
> of tuple of
> modified table to the format of the table specified in the view for cases 
> that the parent
> and some children have different format.
> 
> I think supporting partitioned tables can be left for the next release.

OK. I understand.
In the v24-patch, creating IVM on partions or partition table is prohibited.
It is OK but it should be documented.

Perhaps, the following statement describe this.
If so, I think the definition of "simple base table" is ambiguous for some 
users.

+ IMMVs must be based on simple base tables. It's not supported to
+ create them on top of views or materialized views.


> DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV.
> We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... 
> WITH NO DATA
> is executed. Without the new deptype, we may accidentally delete a dependency 
> created
> with an intention other than the IVM trigger.

OK. I understand.

> I think it is harder than you expected. When an IMMV is switched to a normal
> materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and 
> in
> the opposite case, we need to create them again. The former (IMMV->IVM) might 
> be
> easer, but for the latter (IVM->IMMV) I wonder we would need to re-create
> IMMV.

OK. I understand.


Regards,
Ryohei Takahashi




RE: Implementing Incremental View Maintenance

2021-09-06 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


I'm still reading the patch.
I have additional comments.


(1)
In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member 
is added to IntoClause struct.
I think it is necessary to modify _copyIntoClause() and _equalIntoClause() 
functions.


(2)
By executing pg_dump with 
v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
the constraint which is automatically created during "CREATE INCREMENTAL 
MATERIALIZED VIEW" is also dumped.
This cause error during recovery as follows.

ivm=# create table t (c1 int, c2 int);
CREATE TABLE
ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
SELECT 0

Then I executed pg_dump.

In the dump, the following SQLs appear.

CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
 SELECT DISTINCT t.c1
   FROM public.t
  WITH NO DATA;

ALTER TABLE ONLY public.ivm_t
ADD CONSTRAINT ivm_t_index UNIQUE (c1);

If I execute psql with the result of pg_dump, following error occurs.

ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
DETAIL:  This operation is not supported for materialized views.


Regards,
Ryohei Takahashi




RE: Implementing Incremental View Maintenance

2021-08-05 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


Thank you for your reply.

> I'll investigate this more, but we may have to prohibit views on partitioned
> table and partitions.

I think this restriction is strict.
This feature is useful when the base table is large and partitioning is also 
useful in such case.


I have several additional comments on the patch.


(1)
The following features are added to transition table.
- Prolong lifespan of transition table
- If table has row security policies, set them to the transition table
- Calculate pre-state of the table

Are these features only for IVM?
If there are other useful case, they should be separated from IVM patch and
should be independent patch for transition table.


(2)
DEPENDENCY_IMMV (m) is added to deptype of pg_depend.
What is the difference compared with existing deptype such as 
DEPENDENCY_INTERNAL (i)?


(3)
Converting from normal materialized view to IVM or from IVM to normal 
materialized view is not implemented yet.
Is it difficult?

I think create/drop triggers and __ivm_ columns can achieve this feature.


Regards,
Ryohei Takahashi




RE: Implementing Incremental View Maintenance

2021-08-03 Thread r.takahash...@fujitsu.com
Hi Nagata-san,


I am interested in this patch since it is good feature.

I run some simple tests.
I found the following problems.


(1) 
Failed to "make world".
I think there are extra "" in 
doc/src/sgml/ref/create_materialized_view.sgml
(line 110 and 117)


(2)
In the case of partition, it seems that IVM does not work well.
I run as follows.

postgres=# create table parent (c int) partition by range (c);
CREATE TABLE
postgres=# create table child partition of parent for values from (1) to (100);
CREATE TABLE
postgres=# create incremental materialized view ivm_parent as select c from 
parent;
NOTICE:  could not create an index on materialized view "ivm_parent" 
automatically
HINT:  Create an index on the materialized view for efficient incremental 
maintenance.
SELECT 0
postgres=# create incremental materialized view ivm_child as select c from 
child;
NOTICE:  could not create an index on materialized view "ivm_child" 
automatically
HINT:  Create an index on the materialized view for efficient incremental 
maintenance.
SELECT 0
postgres=# insert into parent values (1);
INSERT 0 1
postgres=# insert into child values (2);
INSERT 0 1
postgres=# select * from parent;
 c
---
 1
 2
(2 rows)

postgres=# select * from child;
 c
---
 1
 2
(2 rows)

postgres=# select * from ivm_parent;
 c
---
 1
(1 row)

postgres=# select * from ivm_child;
 c
---
 2
(1 row)


I think ivm_parent and ivm_child should return 2 rows.


(3)
I think IVM does not support foreign table, but try to make IVM.

postgres=# create incremental materialized view ivm_foreign as select c from 
foreign_table;
NOTICE:  could not create an index on materialized view "ivm_foreign" 
automatically
HINT:  Create an index on the materialized view for efficient incremental 
maintenance.
ERROR:  "foreign_table" is a foreign table
DETAIL:  Triggers on foreign tables cannot have transition tables.

It finally failed to make IVM, but I think it should be checked more early.


Regards,
Ryohei Takahashi




RE: Speed up COMMIT PREPARED

2021-07-15 Thread r.takahash...@fujitsu.com
Hi,


I noticed that anti-virus software slow down the open().
I stopped the anti-virus software and re-run the test.
(Average of 10 times)

master: 1924tps
Hold_xlogreader.patch: 1993tps (+3.5%)
Read_from_walbuffer.patch: 1954tps(+1.5%)

Therefore, the effect of my patch is limited.

I'm sorry for the confusion.

Regards,
Ryohei Takahashi




RE: Speed up COMMIT PREPARED

2021-07-15 Thread r.takahash...@fujitsu.com
I noticed that the previous Read_from_walbuffer.patch has a mistake in 
xlogreader.c.
Could you please use the attached v2 patch?

The performance result of the previous mail is the result of v2 patch.


Regards,
Ryohei Takahashi


v2-Read_from_walbuffer.patch
Description: v2-Read_from_walbuffer.patch


RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-15 Thread r.takahash...@fujitsu.com
Hi Sawada-san,


Thank you for your reply.

> BTW did you test on the local? That is, the foreign servers are
> located on the same machine?

Yes, I tested on the local since I cannot prepare the good network now.


> I guess it would be better to start a new thread for this improvement.

Thank you for your advice.
I started a new thread [1].


> What if we successfully committed 'prep_1' but an error happened
> during committing another one for some reason (i.g., corrupted 2PC
> state file, OOM etc)? We might return an error to the client but have
> already committed 'prep_1'.

Sorry, I don't have good idea now.
I imagined the command returns the list of the transaction id which ends with 
error.


[1]
https://www.postgresql.org/message-id/OS0PR01MB56828019B25CD5190AB6093282129%40OS0PR01MB5682.jpnprd01.prod.outlook.com


Regards,
Ryohei Takahashi




Speed up COMMIT PREPARED

2021-07-15 Thread r.takahash...@fujitsu.com
Hi,


I noticed that COMMIT PREPARED command is slow in the discussion [1].


First, I made the following simple script for pgbench.

``` prepare.pgbench
\set id random(1, 100)

BEGIN;
UPDATE test_table SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
PREPARE TRANSACTION 'prep_:client_id';
COMMIT PREPARED 'prep_:client_id';
```

I run the pgbench as follows.

```
pgbench -f prepare.pgbench -c 1 -j 1 -T 60 -d postgres -r
```

The result is following.



tps:287.259
Latency:
  UPDATE0.207ms
  PREPARE TRANSACTION   0.212ms
  COMMIT PREPARED   2.982ms


Next, I analyzed the bottleneck using pstack and strace.
I noticed that the open() during COMMIT PREPARED takes 2.7ms.
Furthermore, I noticed that the backend process almost always open the same wal 
segment file.


When COMMIT PREPARED command, there are two ways to find 2PC state data.
 - If it is stored in wal segment file, open and read wal segment file.
 - If not, read 2PC state file

The above script runs COMMIT PREPARED command just after PREPARE TRANSACTION 
command.
I think it also won't take long time for XA transaction to run COMMIT PREPARED 
command after running PREPARE TRANSACTION command.
Therefore, I think that the wal segment file which is opened during COMMIT 
PREPARED probably be the current wal segment file.


To speed up COMMIT PREPARED command, I made two patches for test.


(1) Hold_xlogreader.patch
Skip closing wal segment file after COMMIT PREPARED command completed.
If the next COMMIT PREPARED command use the same wal segment file, it is fast 
since the process need not to open wal segment file.
However, I don't know when we should close the wal segment file.
Moreover, it might not be useful when COMMIT PREPARED command is run not so 
often and use different wal segment file each time.


tps:1750.81
Latency:
  UPDATE0.156ms
  PREPARE TRANSACTION   0.184ms
  COMMIT PREPARED   0.179ms
  

(2) Read_from_walbuffer.patch
Read the data from wal buffer if there are still in wal buffer.
If COMMIT PREPARED command is run just after PREPARE TRANSACTION command, the 
wal may be in the wal buffer.
However, the period which the wal is in the wal buffer is not so long since wal 
writer recycle the wal buffer soon.
Moreover, it may affect the other performance such as UPDATE since it needs to 
take lock on wal buffer.


tps:446.371
Latency:
  UPDATE0.187ms
  PREPARE TRANSACTION   0.196ms
  COMMIT PREPARED   1.974ms


Which approach do you think better?


[1] 
https://www.postgresql.org/message-id/20191206.173215.1818665441859410805.horikyota.ntt%40gmail.com


Regards,
Ryohei Takahashi


Hold_xlogreader.patch
Description: Hold_xlogreader.patch


Read_from_walbuffer.patch
Description: Read_from_walbuffer.patch


RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-13 Thread r.takahash...@fujitsu.com
Hi,


> Wouldn't it be better to explicitly initialize the pointer with NULL?

Thank you for your advice.
You are correct.

Anyway, I fixed it and re-run the performance test, it of course does not 
affect tps.

Regards,
Ryohei Takahashi


RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-12 Thread r.takahash...@fujitsu.com
Hi Sawada-san,


Thank you for your reply.

> Not sure but it might be possible to keep holding an xlogreader for
> reading PREPARE WAL records even after the transaction commit. But I
> wonder how much open() for wal segment file accounts for the total
> execution time of 2PC. 2PC requires 2 network round trips for each
> participant. For example, if it took 500ms in total, we would not get
> benefits much from the point of view of 2PC performance even if we
> improved it from 14ms to 1ms.

I made the patch based on your advice and re-run the test on the new machine.
(The attached patch is just for test purpose.)


* foreign_twophase_commit = disabled
2686tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
Ikeda-san said)
311tps

* foreign_twophase_commit = required with attached patch (It is not necessary 
to set -R ${RATE})
2057tps


This indicate that if we can reduce the number of times to open() wal segment 
file during "COMMIT PREPARED", the performance can be improved.

This patch can skip closing wal segment file, but I don't know when we should 
close.
One idea is to close when the wal segment file is recycled, but it seems 
difficult for backend process to do so.

BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is 
proposed.
I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 
'prep_n'".
If we can open wal segment file during bulk COMMIT PREPARED, we can not only 
reduce the times of communication, but also reduce the times of open() wal 
segment file.


Regards,
Ryohei Takahashi


Hold_xlogreader.patch
Description: Hold_xlogreader.patch


RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-06 Thread r.takahash...@fujitsu.com
Hi,


I'm interested in this patch and I also run the same test with Ikeda-san's 
fxact_update.pgbench.
In my environment (poor spec VM), the result is following.

* foreign_twophase_commit = disabled
363tps

* foreign_twophase_commit = required (It is necessary to set -R ${RATE} as 
Ikeda-san said)
13tps


I analyzed the bottleneck using pstack and strace.
I noticed that the open() during "COMMIT PREPARED" command is very slow.

In my environment the latency of the "COMMIT PREPARED" is 16ms.
(On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms)
In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms.
Therefore, open() is the bottleneck of "COMMIT PREPARED".
Furthermore, I noticed that the backend process almost always open the same wal 
segment file.

In the current patch, the backend process on foreign server which is associated 
with the connection from the resolver process always run "COMMIT PREPARED" 
command.
Therefore, the wal segment file of the current "COMMIT PREPARED" command 
probably be the same with the previous "COMMIT PREPARED" command.

In order to improve the performance of the resolver process, I think it is 
useful to skip closing wal segment file during the "COMMIT PREPARED" and reuse 
file descriptor.
Is it possible?


Regards,
Ryohei Takahashi




RE: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread r.takahash...@fujitsu.com
Hi Michael-san,


> Attached is a patch to do that, which should go down to v12 where
> tcp_user_timeout has been introduced.  Takahashi-san, what do you
> think?

Thank you for creating the patch.
This patch is what I expected.

I'm not sure whether this patch should be applied to postgres below 11
since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same 
error.

Regards,
Ryohei Takahashi





pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-01 Thread r.takahash...@fujitsu.com
Hi


pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in 
following environment.

[Environment]
Postgres 13dev (master branch)
Red Hat Enterprise Postgres 7.4

[Error]
$ pg_basebackup -F t --progress --verbose -h  -D 
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5A60 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_15647"
pg_basebackup: error: could not read COPY data: server closed the connection 
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

[Analysis]
- pg_basebackup -F t creates a tar file and does fsync() for each tablespace.
  (Otherwise, -F p does fsync() only once at the end.)
- While doing fsync() for a tar file for one tablespace, wal sender sends the 
content of the next tablespace.
  When fsync() spends long time, the tcp socket of pg_basebackup returns "zero 
window" packets to wal sender.
  This means the tcp socket buffer of pg_basebackup is exhausted since 
pg_basebackup cannot receive during fsync().
- The socket of wal sender retries to send the packet, but resets connection 
after tcp_user_timeout.
  After wal sender resets connection, pg_basebackup cannot receive data and 
fails with above error.

[Solution]
I think fsync() for each tablespace is not necessary.
Like pg_basebackup -F p, I think fsync() is necessary only once at the end.


Could you give me any comment?


Regards,
Ryohei Takahashi