Re: POC and rebased patch for CSN based snapshots

2020-07-15 Thread Andrey V. Lepikhov

On 7/13/20 11:46 AM, movead...@highgo.ca wrote:

I continue to see your patch. Some code improvements see at the attachment.

Questions:
* csnSnapshotActive is the only member of the CSNshapshotShared struct.
* The WriteAssignCSNXlogRec() routine. I din't understand why you add 20 
nanosec to current CSN and write this into the WAL. For simplify our 
communication, I rewrote this routine in accordance with my opinion (see 
patch in attachment).


At general, maybe we will add your WAL writing CSN machinery + TAP tests 
to the patch from the thread [1] and work on it together?


[1] 
https://www.postgresql.org/message-id/flat/07b2c899-4ed0-4c87-1327-23c750311248%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional
>From 9a1595507c83b5fde61a6a3cc30f6df9df410e76 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Wed, 15 Jul 2020 11:55:00 +0500
Subject: [PATCH] 1

---
 src/backend/access/transam/csn_log.c   | 35 --
 src/include/access/csn_log.h   |  8 +++---
 src/test/regress/expected/sysviews.out |  3 ++-
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/csn_log.c b/src/backend/access/transam/csn_log.c
index 319e89c805..53d3877851 100644
--- a/src/backend/access/transam/csn_log.c
+++ b/src/backend/access/transam/csn_log.c
@@ -150,8 +150,8 @@ CSNLogSetCSN(TransactionId xid, int nsubxids,
  */
 static void
 CSNLogSetPageStatus(TransactionId xid, int nsubxids,
-		   TransactionId *subxids,
-		   XidCSN csn, int pageno)
+	TransactionId *subxids,
+	XidCSN csn, int pageno)
 {
 	int			slotno;
 	int			i;
@@ -187,8 +187,8 @@ CSNLogSetCSNInSlot(TransactionId xid, XidCSN csn, int slotno)
 
 	Assert(LWLockHeldByMe(CSNLogControlLock));
 
-	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
-
+	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] +
+	entryno * sizeof(XidCSN));
 	*ptr = csn;
 }
 
@@ -205,17 +205,16 @@ CSNLogGetCSNByXid(TransactionId xid)
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToPgIndex(xid);
 	int			slotno;
-	XidCSN *ptr;
-	XidCSN	xid_csn;
+	XidCSN		csn;
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 	slotno = SimpleLruReadPage_ReadOnly(CsnlogCtl, pageno, xid);
-	ptr = (XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
-	xid_csn = *ptr;
+	csn = *(XidCSN *) (CsnlogCtl->shared->page_buffer[slotno] +
+	entryno * sizeof(XidCSN));
 
 	LWLockRelease(CSNLogControlLock);
 
-	return xid_csn;
+	return csn;
 }
 
 /*
@@ -501,15 +500,15 @@ WriteAssignCSNXlogRec(XidCSN xidcsn)
 {
 	XidCSN log_csn = 0;
 
-	if(xidcsn > get_last_log_wal_csn())
-	{
-		log_csn = CSNAddByNanosec(xidcsn, 20);
-		set_last_log_wal_csn(log_csn);
-	}
-	else
-	{
+	if(xidcsn <= get_last_log_wal_csn())
+		/*
+		 * WAL-write related code. If concurrent backend already wrote into WAL
+		 * its CSN with bigger value it isn't needed to write this value.
+		 */
 		return;
-	}
+
+	log_csn = CSNAddByNanosec(xidcsn, 0);
+	set_last_log_wal_csn(log_csn);
 
 	XLogBeginInsert();
 	XLogRegisterData((char *) (_csn), sizeof(XidCSN));
@@ -571,7 +570,6 @@ csnlog_redo(XLogReaderState *record)
 		LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
 		set_last_max_csn(csn);
 		LWLockRelease(CSNLogControlLock);
-
 	}
 	else if (info == XLOG_CSN_SETXIDCSN)
 	{
@@ -589,7 +587,6 @@ csnlog_redo(XLogReaderState *record)
 		SimpleLruWritePage(CsnlogCtl, slotno);
 		LWLockRelease(CSNLogControlLock);
 		Assert(!CsnlogCtl->shared->page_dirty[slotno]);
-
 	}
 	else if (info == XLOG_CSN_TRUNCATE)
 	{
diff --git a/src/include/access/csn_log.h b/src/include/access/csn_log.h
index 5838028a30..c23a71446a 100644
--- a/src/include/access/csn_log.h
+++ b/src/include/access/csn_log.h
@@ -15,10 +15,10 @@
 #include "utils/snapshot.h"
 
 /* XLOG stuff */
-#define XLOG_CSN_ASSIGNMENT 0x00
-#define XLOG_CSN_SETXIDCSN   0x10
-#define XLOG_CSN_ZEROPAGE   0x20
-#define XLOG_CSN_TRUNCATE   0x30
+#define XLOG_CSN_ASSIGNMENT	0x00
+#define XLOG_CSN_SETXIDCSN	0x10
+#define XLOG_CSN_ZEROPAGE	0x20
+#define XLOG_CSN_TRUNCATE	0x30
 
 typedef struct xl_xidcsn_set
 {
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..cc169a1999 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -73,6 +73,7 @@ select name, setting from pg_settings where name like 'enable%';
   name  | setting 
 +-
  enable_bitmapscan  | on
+ enable_csn_snapshot| off
  enable_gathermerge | on
  enable_hashagg | on
  enable_hashjoin| on
@@ -90,7 +91,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan | on
  enable_sort| on
  enable_tidscan | on

Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread Fujii Masao




On 2020/07/14 11:02, movead...@highgo.ca wrote:



When checking each tuple visibility, we always have to get the CSN
corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
there was the suggestion that CSN should be stored in the tuple header
or somewhere (like hint bit) to avoid the overhead by very frequehntly
lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
But this patch doesn't seem to adopt that idea. So did you confirm that
such performance overhead by lookup for CSN SLRU is negligible?

This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline.


This is good news! When I read the past discussions about CSN, my impression
was that the performance overhead by CSN SLRU lookup might become one of
show-stopper for CSN. So I was worring about this issue...



And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary.


Yes, we would need more tests in several cases.



Of course I know that idea has big issue, i.e., there is no enough space
to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
be able to replace XMIN or XMAX with CSN corresponding to them. But
it means that we have to struggle with one more wraparound issue
(CSN wraparound issue). So it's not easy to adopt that idea...



Sorry if this was already discussed and concluded...

I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?


Probably you can find the discussion by searching with the keywords
"CSN" and "hint bit". For example,

https://www.postgresql.org/message-id/CAPpHfdv7BMwGv=ofug3s-jgvfkqhi79pr_zk1wsk-13oz+c...@mail.gmail.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread movead...@highgo.ca

>When checking each tuple visibility, we always have to get the CSN
>corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
>there was the suggestion that CSN should be stored in the tuple header
>or somewhere (like hint bit) to avoid the overhead by very frequehntly
>lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
>But this patch doesn't seem to adopt that idea. So did you confirm that
>such performance overhead by lookup for CSN SLRU is negligible?
This patch came from postgrespro's patch which shows a good performance,
I have simple test on current patch and result no performance decline. 
And not everytime we do a tuple visibility need lookup forCSN SLRU, only xid
large than 'TransactionXmin' need that. Maybe we have not touch the case
which cause bad performance, so it shows good performance temporary. 

>Of course I know that idea has big issue, i.e., there is no enough space
>to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
>be able to replace XMIN or XMAX with CSN corresponding to them. But
>it means that we have to struggle with one more wraparound issue
>(CSN wraparound issue). So it's not easy to adopt that idea...

>Sorry if this was already discussed and concluded...
I think your point with CSN in tuple header is a exciting approach, but I have
not seen the discussion, can you show me the discussion address?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread Fujii Masao




On 2020/06/19 14:54, Fujii Masao wrote:



On 2020/06/19 13:36, movead...@highgo.ca wrote:


 >You mean that the last generated CSN needs to be WAL-logged because any 
smaller

CSN than the last one should not be reused after crash recovery. Right?

Yes that's it.


If right, that WAL-logging seems not necessary because CSN mechanism assumes
CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
crash recovery must be larger than that before. No?


CSN collected based on time of  system in this patch, but time is not reliable 
all the
time. And it designed for Global CSN(for sharding) where it may rely on CSN from
other node , which generated from other machine.

So monotonically is not reliable and it need to keep it's largest CSN in wal in 
case
of crash.


Thanks for the explanaion! Understood.


I have another question about this patch;

When checking each tuple visibility, we always have to get the CSN
corresponding to XMIN or XMAX from CSN SLRU. In the past discussion,
there was the suggestion that CSN should be stored in the tuple header
or somewhere (like hint bit) to avoid the overhead by very frequehntly
lookup for CSN SLRU. I'm not sure the conclusion of this discussion.
But this patch doesn't seem to adopt that idea. So did you confirm that
such performance overhead by lookup for CSN SLRU is negligible?

Of course I know that idea has big issue, i.e., there is no enough space
to store CSN in a tuple header if CSN is 64 bits. If CSN is 32 bits, we may
be able to replace XMIN or XMAX with CSN corresponding to them. But
it means that we have to struggle with one more wraparound issue
(CSN wraparound issue). So it's not easy to adopt that idea...

Sorry if this was already discussed and concluded...

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-07-13 Thread movead...@highgo.ca

>I have doubts that I fully understood your question, but still.
>What real problems do you see here? Transaction t1 doesn't get state of
>shard2 until time at node with shard2 won't reach start time of t1.
>If transaction, that inserted B wants to know about it position in time
>relatively to t1 it will generate CSN, attach to node1 and will see,
>that t1 is not started yet.
 
>Maybe you are saying about the case that someone who has a faster data
>channel can use the knowledge from node1 to change the state at node2?
>If so, i think it is not a problem, or you can explain your idea.
Sorry, I think this is my wrong understand about Clock-SI. At first I expect
we can get a absolutly snapshot, for example B should not include in the
snapshot because it happened after time t1. How ever Clock-SI can not guarantee
that and no design guarantee that at all.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
 


Re: POC and rebased patch for CSN based snapshots

2020-07-12 Thread Andrey V. Lepikhov

On 7/4/20 7:56 PM, movead...@highgo.ca wrote:



As far as I know about Clock-SI, left part of the blue line will
setup as a snapshot

if master require a snapshot at time t1. But in fact data A should
in snapshot but

not and data B should out of snapshot but not.


If this scene may appear in your origin patch? Or something my
understand about

Clock-SI is wrong?





Sorry for late answer.

I have doubts that I fully understood your question, but still.
What real problems do you see here? Transaction t1 doesn't get state of 
shard2 until time at node with shard2 won't reach start time of t1.
If transaction, that inserted B wants to know about it position in time 
relatively to t1 it will generate CSN, attach to node1 and will see, 
that t1 is not started yet.


Maybe you are saying about the case that someone who has a faster data 
channel can use the knowledge from node1 to change the state at node2?

If so, i think it is not a problem, or you can explain your idea.

--
regards,
Andrey Lepikhov
Postgres Professional




Re: POC and rebased patch for CSN based snapshots

2020-07-04 Thread movead...@highgo.ca
Hello Andrey

>> I have researched your patch which is so great, in the patch only data
>> out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
>> tuple even if no snapshot import at all,right?
>> 
>> I am thanking about a way if we can start remain dead tuple just before
>> we import a csn snapshot.
>> 
>> Base on Clock-SI paper, we should get local CSN then send to shard nodes,
>> because we do not known if the shard nodes' csn bigger or smaller then
>> master node, so we should keep some dead tuple all the time to support
>> snapshot import anytime.
>> 
>> Then if we can do a small change to CLock-SI model, we do not use the
>> local csn when transaction start, instead we touch every shard node for
>> require their csn, and shard nodes start keep dead tuple, and master node
>> choose the biggest csn to send to shard nodes.
>> 
>> By the new way, we do not need to keep dead tuple all the time and do
>> not need to manage a ring buf, we can give to ball to 'snapshot too old'
>> feature. But for trade off, almost all shard node need wait.
>> I will send more detail explain in few days.
>I think, in the case of distributed system and many servers it can be 
>bottleneck.
>Main idea of "deferred time" is to reduce interference between DML 
>queries in the case of intensive OLTP workload. This time can be reduced 
>if the bloationg of a database prevails over the frequency of 
>transaction aborts.
OK there maybe a performance issue, and I have another question about Clock-SI.

For example we have three  nodes, shard1(as master), shard2, shard3, which
(time of node2) > (time of node2) > (time of node3), and you can see a picture:
http://movead.gitee.io/picture/blog_img_bad/csn/clock_si_question.png 
As far as I know about Clock-SI, left part of the blue line will setup as a 
snapshotif master require a snapshot at time t1. But in fact data A should in 
snapshot butnot and data B should out of snapshot but not.
If this scene may appear in your origin patch? Or something my understand 
aboutClock-SI is wrong?



Re: POC and rebased patch for CSN based snapshots

2020-07-02 Thread Andrey V. Lepikhov

On 7/2/20 7:31 PM, Movead Li wrote:

Thanks for the remarks,

 >Some remarks on your patch:
 >1. The variable last_max_csn can be an atomic variable.
Yes will consider.

 >2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn
 >This is the case when someone changed the value of the system clock. I
 >think it is needed to write a WARNING to the log file. (May be we can do
 >synchronization with a time server.
Yes good point, I will work out a way to report the warning, it should 
exist a

report gap rather than report every time it generates CSN.
If we really need a correct time? What's the inferiority if one node 
generate

csn by monotonically increasing?
Changes in time values can lead to poor effects, such as old snapshot. 
Adjusting the time can be a kind of defense.


 >3. That about global snapshot xmin? In the pgpro version of the patch we
 >had GlobalSnapshotMapXmin() routine to maintain circular buffer of
 >oldestXmins for several seconds in past. This buffer allows to shift
 >oldestXmin in the past when backend is importing global transaction.
 >Otherwise old versions of tuples that were needed for this transaction
 >can be recycled by other processes (vacuum, HOT, etc).
 >How do you implement protection from local pruning? I saw
 >SNAP_DESYNC_COMPLAIN, but it is not used anywhere.
I have researched your patch which is so great, in the patch only data
out of 'global_snapshot_defer_time' can be vacuum, and it keep dead
tuple even if no snapshot import at all,right?

I am thanking about a way if we can start remain dead tuple just before
we import a csn snapshot.

Base on Clock-SI paper, we should get local CSN then send to shard nodes,
because we do not known if the shard nodes' csn bigger or smaller then
master node, so we should keep some dead tuple all the time to support
snapshot import anytime.

Then if we can do a small change to CLock-SI model, we do not use the
local csn when transaction start, instead we touch every shard node for
require their csn, and shard nodes start keep dead tuple, and master node
choose the biggest csn to send to shard nodes.

By the new way, we do not need to keep dead tuple all the time and do
not need to manage a ring buf, we can give to ball to 'snapshot too old'
feature. But for trade off, almost all shard node need wait.
I will send more detail explain in few days.
I think, in the case of distributed system and many servers it can be 
bottleneck.
Main idea of "deferred time" is to reduce interference between DML 
queries in the case of intensive OLTP workload. This time can be reduced 
if the bloationg of a database prevails over the frequency of 
transaction aborts.



 >4. The current version of the patch is not applied clearly with current
 >master.
Maybe it's because of the release of PG13, it cause some conflict, I will
rebase it.

Ok


---
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca





--
regards,
Andrey Lepikhov
Postgres Professional




Re: POC and rebased patch for CSN based snapshots

2020-07-02 Thread Movead Li
Thanks for the remarks,



>Some remarks on your patch: 

>1. The variable last_max_csn can be an atomic variable. 

Yes will consider.



>2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn 

>This is the case when someone changed the value of the system clock. I 

>think it is needed to write a WARNING to the log file. (May be we can do 

>synchronization with a time server. 

Yes good point, I will work out a way to report the warning, it should exist a 

report gap rather than report every time it generates CSN.

If we really need a correct time? What's the inferiority if one node generate

csn by monotonically increasing?



>3. That about global snapshot xmin? In the pgpro version of the patch we 

>had GlobalSnapshotMapXmin() routine to maintain circular buffer of 

>oldestXmins for several seconds in past. This buffer allows to shift 

>oldestXmin in the past when backend is importing global transaction. 

>Otherwise old versions of tuples that were needed for this transaction 

>can be recycled by other processes (vacuum, HOT, etc). 

>How do you implement protection from local pruning? I saw 

>SNAP_DESYNC_COMPLAIN, but it is not used anywhere.

I have researched your patch which is so great, in the patch only data

out of 'global_snapshot_defer_time' can be vacuum, and it keep dead

tuple even if no snapshot import at all,right?



I am thanking about a way if we can start remain dead tuple just before

we import a csn snapshot.



Base on Clock-SI paper, we should get local CSN then send to shard nodes,

because we do not known if the shard nodes' csn bigger or smaller then

master node, so we should keep some dead tuple all the time to support

snapshot import anytime.



Then if we can do a small change to CLock-SI model, we do not use the 

local csn when transaction start, instead we touch every shard node for

require their csn, and shard nodes start keep dead tuple, and master node

choose the biggest csn to send to shard nodes.



By the new way, we do not need to keep dead tuple all the time and do

not need to manage a ring buf, we can give to ball to 'snapshot too old'

feature. But for trade off, almost all shard node need wait. 

I will send more detail explain in few days.





>4. The current version of the patch is not applied clearly with current 

>master. 

Maybe it's because of the release of PG13, it cause some conflict, I will

rebase it.



---
Regards,

Highgo Software (Canada/China/Pakistan) 

URL : http://www.highgo.ca/ 

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: POC and rebased patch for CSN based snapshots

2020-06-29 Thread Andrey V. Lepikhov

On 6/12/20 2:41 PM, movead...@highgo.ca wrote:

Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current  commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.


Some remarks on your patch:
1. The variable last_max_csn can be an atomic variable.
2. GenerateCSN() routine: in the case than csn < csnState->last_max_csn 
This is the case when someone changed the value of the system clock. I 
think it is needed to write a WARNING to the log file. (May be we can do 
synchronization with a time server.
3. That about global snapshot xmin? In the pgpro version of the patch we 
had GlobalSnapshotMapXmin() routine to maintain circular buffer of 
oldestXmins for several seconds in past. This buffer allows to shift 
oldestXmin in the past when backend is importing global transaction. 
Otherwise old versions of tuples that were needed for this transaction 
can be recycled by other processes (vacuum, HOT, etc).
How do you implement protection from local pruning? I saw 
SNAP_DESYNC_COMPLAIN, but it is not used anywhere.
4. The current version of the patch is not applied clearly with current 
master.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread Fujii Masao




On 2020/06/19 13:36, movead...@highgo.ca wrote:


 >You mean that the last generated CSN needs to be WAL-logged because any 
smaller

CSN than the last one should not be reused after crash recovery. Right?

Yes that's it.


If right, that WAL-logging seems not necessary because CSN mechanism assumes
CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
crash recovery must be larger than that before. No?


CSN collected based on time of  system in this patch, but time is not reliable 
all the
time. And it designed for Global CSN(for sharding) where it may rely on CSN from
other node , which generated from other machine.

So monotonically is not reliable and it need to keep it's largest CSN in wal in 
case
of crash.


Thanks for the explanaion! Understood.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

>You mean that the last generated CSN needs to be WAL-logged because any smaller
>CSN than the last one should not be reused after crash recovery. Right?
Yes that's it. 

>If right, that WAL-logging seems not necessary because CSN mechanism assumes
>CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
>crash recovery must be larger than that before. No?

CSN collected based on time of  system in this patch, but time is not reliable 
all the
time. And it designed for Global CSN(for sharding) where it may rely on CSN from
other node , which generated from other machine. 

So monotonically is not reliable and it need to keep it's largest CSN in wal in 
case
of crash.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread Fujii Masao




On 2020/06/19 12:12, movead...@highgo.ca wrote:


Thanks for reply.

 >Probably it's not time to do the code review yet, but when I glanced the 
patch,

I came up with one question.
0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
(and inserts it into WAL buffers). Which means that new WAL record is generated
whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
really necessary for CSN?

This is designed for crash recovery, here we record our most new lsn in wal so 
it
will not use a history lsn after a restart. It will not write into wal every 
time, but with
a gap which you can see CSNAddByNanosec() function.


You mean that the last generated CSN needs to be WAL-logged because any smaller
CSN than the last one should not be reused after crash recovery. Right?

If right, that WAL-logging seems not necessary because CSN mechanism assumes
CSN is increased monotonically. IOW, even without that WAL-logging, CSN afer
crash recovery must be larger than that before. No?



BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
acceptable because spinlocks are intended for *very* short-term locks.
Secondly, I don't think that WAL generation during ProcArrayLock is good
design because ProcArrayLock is likely to be bottleneck and its term should
be short for performance gain.

Thanks for point out which may help me deeply, I will reconsider that.


Thanks for working on this!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

Thanks for reply.

>AFAIU, this patch is to improve scalability and also will be helpful
>for Global Snapshots stuff, is that right?  If so, how much
>performance/scalability benefit this patch will have after Andres's
>recent work on scalability [1]?
The patch focus on to be an infrastructure of sharding feature, according
to my test almost has the same performance with and without the patch.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread movead...@highgo.ca

Thanks for reply.

>Probably it's not time to do the code review yet, but when I glanced the patch,
>I came up with one question.
>0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
>(and inserts it into WAL buffers). Which means that new WAL record is generated
>whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
>really necessary for CSN?
This is designed for crash recovery, here we record our most new lsn in wal so 
it
will not use a history lsn after a restart. It will not write into wal every 
time, but with
a gap which you can see CSNAddByNanosec() function.

>BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
>WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
>acceptable because spinlocks are intended for *very* short-term locks.
>Secondly, I don't think that WAL generation during ProcArrayLock is good
>design because ProcArrayLock is likely to be bottleneck and its term should
>be short for performance gain.
Thanks for point out which may help me deeply, I will reconsider that.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: POC and rebased patch for CSN based snapshots

2020-06-18 Thread Fujii Masao




On 2020/06/15 16:48, Fujii Masao wrote:



On 2020/06/12 18:41, movead...@highgo.ca wrote:

Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current  commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.


Probably it's not time to do the code review yet, but when I glanced the patch,
I came up with one question.

0002 patch changes GenerateCSN() so that it generates CSN-related WAL records
(and inserts it into WAL buffers). Which means that new WAL record is generated
whenever CSN is assigned, e.g., in GetSnapshotData(). Is this WAL generation
really necessary for CSN?

BTW, GenerateCSN() is called while holding ProcArrayLock. Also it inserts new
WAL record in WriteXidCsnXlogRec() while holding spinlock. Firstly this is not
acceptable because spinlocks are intended for *very* short-term locks.
Secondly, I don't think that WAL generation during ProcArrayLock is good
design because ProcArrayLock is likely to be bottleneck and its term should
be short for performance gain.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-06-15 Thread Fujii Masao




On 2020/06/12 18:41, movead...@highgo.ca wrote:

Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current  commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.


Andrey also seems to be proposing the similar patch [1] that introduces CSN
into core. Could you tell me what the difference between his patch and yours?
If they are almost the same, we should focus on one together rather than
working separately?

Regards,

[1]
https://www.postgresql.org/message-id/9964cf46-9294-34b9-4858-971e9029f...@postgrespro.ru

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC and rebased patch for CSN based snapshots

2020-06-13 Thread Amit Kapila
On Fri, Jun 12, 2020 at 3:11 PM movead...@highgo.ca  wrote:
>
> Hello hackers,
>
> Currently, I do some changes based on the last version:
> 1. Catch up to the current  commit (c2bd1fec32ab54).
> 2. Add regression and document.
> 3. Add support to switch from xid-base snapshot to csn-base snapshot,
> and the same with standby side.
>

AFAIU, this patch is to improve scalability and also will be helpful
for Global Snapshots stuff, is that right?  If so, how much
performance/scalability benefit this patch will have after Andres's
recent work on scalability [1]?

[1] - 
https://www.postgresql.org/message-id/20200301083601.ews6hz5dduc3w2se%40alap3.anarazel.de

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: POC and rebased patch for CSN based snapshots

2020-06-12 Thread movead...@highgo.ca
Hello hackers,

Currently, I do some changes based on the last version:
1. Catch up to the current  commit (c2bd1fec32ab54).
2. Add regression and document.
3. Add support to switch from xid-base snapshot to csn-base snapshot,
and the same with standby side.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca





0001-CSN-base-snapshot.patch
Description: Binary data


0002-Wal-for-csn.patch
Description: Binary data


0003-snapshot-switch.patch
Description: Binary data


POC and rebased patch for CSN based snapshots

2020-05-08 Thread Movead Li
Hello hackers,


I have read the community mail from 'postgrespro' which the link

below ①, a summary for the patch, it generals a CSN by timestamp

when a transaction is committed and assigns a special value as CSN

for abort transaction, and record them in CSN SLRU file. Now we can

judge if a xid available in a snapshot with a CSN value instead of by

xmin,xmax and xip array so that if we hold CSN as a snapshot which

can be export and import.





CSN may be a correct direction and an important part to implement

distributed of PostgreSQL because it delivers few data among cross-nodes

for snapshot, so the patch is meant to do some research.



We want to implement Clock-SI base on the patch.However the patch

is too old, and I rebase the infrastructure part of the patch to recently

commit(7dc37ccea85).



The origin patch does not support csn alive among database restart

because it will clean csnlog at every time the database restart, it works

well until a prepared transaction occurs due to the csn of prepare

transaction cleaned by a database restart. So I add wal support for

csnlog then csn can alive all the time, and move the csnlog clean work

to auto vacuum.



It comes to another issue, now it can't switch from a xid-base snapshot

to csn-base snapshot if a prepare transaction exists because it can not

find csn for the prepare transaction produced during xid-base snapshot.

To solve it, if the database restart with snapshot change to csn-base I

record an 'xmin_for_csn' where start to check with csn snapshot. 



Some issues known about the current patch:

1. The CSN-snapshot support repeatable read isolation level only, we

should try to support other isolation levels.



2. We can not switch fluently from xid-base->csn-base, if there be prepared

transaction in database.

 

What do you think about it, I want try to test and improve the patch step
by step. 



①https://www.postgresql.org/message-id/21BC916B-80A1-43BF-8650-3363CCDAE09C%40postgrespro.ru
 



---
Regards,

Highgo Software (Canada/China/Pakistan) 

URL : http://www.highgo.ca/ 

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

0001-CSN-base-snapshot.patch
Description: Binary data


0002-Wal-for-csn.patch
Description: Binary data