Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-30 Thread Ivan Kartyshov

Ants Aasma писал 2017-10-26 17:29:

On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
<i.kartys...@postgrespro.ru> wrote:

Ants Aasma писал 2017-09-26 13:00:


Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to 
be

a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.



I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.


Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma


It sounds reasonable. I can offer the following version.

WAIT  LSN  lsn_number;
WAIT  LSN  lsn_number  TIMEOUT delay;
WAIT  LSN  lsn_number  INFINITE;
WAIT  LSN  lsn_number  NOWAIT;


WAIT  [token]  wait_value [option];

token - [LSN, TIME | TIMESTAMP | CSN | XID]
option - [TIMEOUT | INFINITE | NOWAIT]

Ready to listen to your suggestions.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-24 Thread Ivan Kartyshov

Hello. I made some bugfixes and rewrite the patch.

Simon Riggs писал 2017-09-05 14:44:

As Alexander says, simply skipping truncation if standby is busy isn't
a great plan.

If we defer an action on standby replay, when and who will we apply
it? What happens if the standby is shutdown or crashes while an action
is pending.


After crash standby server will go to the nearest checkout and will 
replay
all his wal`s to reach consistent state and then open for read-only 
load.

(all collected wal`s will be replayed in right way, I meant wal`s that
truncates table and wal`s that make table grow)

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 9a5fde0..68decab 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -31,6 +31,9 @@
 #include "storage/smgr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "storage/lmgr.h"
+#include "storage/procarray.h"
+#include "access/transam.h"
 
 /*
  * We keep a list of all relations (represented as RelFileNode values)
@@ -498,50 +501,56 @@ smgr_redo(XLogReaderState *record)
 
 		reln = smgropen(xlrec->rnode, InvalidBackendId);
 
-		/*
-		 * Forcibly create relation if it doesn't exist (which suggests that
-		 * it was dropped somewhere later in the WAL sequence).  As in
-		 * XLogReadBufferForRedo, we prefer to recreate the rel and replay the
-		 * log as best we can until the drop is seen.
-		 */
-		smgrcreate(reln, MAIN_FORKNUM, true);
-
-		/*
-		 * Before we perform the truncation, update minimum recovery point to
-		 * cover this WAL record. Once the relation is truncated, there's no
-		 * going back. The buffer manager enforces the WAL-first rule for
-		 * normal updates to relation files, so that the minimum recovery
-		 * point is always updated before the corresponding change in the data
-		 * file is flushed to disk. We have to do the same manually here.
-		 *
-		 * Doing this before the truncation means that if the truncation fails
-		 * for some reason, you cannot start up the system even after restart,
-		 * until you fix the underlying situation so that the truncation will
-		 * succeed. Alternatively, we could update the minimum recovery point
-		 * after truncation, but that would leave a small window where the
-		 * WAL-first rule could be violated.
-		 */
-		XLogFlush(lsn);
-
-		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
+		if (!ConditionalLockRelationOid(reln->smgr_rnode.node.relNode, AccessExclusiveLock))
 		{
-			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
+			/*
+			 * Forcibly create relation if it doesn't exist (which suggests that
+			 * it was dropped somewhere later in the WAL sequence).  As in
+			 * XLogReadBufferForRedo, we prefer to recreate the rel and replay the
+			 * log as best we can until the drop is seen.
+			 */
+			smgrcreate(reln, MAIN_FORKNUM, true);
+
+			/*
+			 * Before we perform the truncation, update minimum recovery point to
+			 * cover this WAL record. Once the relation is truncated, there's no
+			 * going back. The buffer manager enforces the WAL-first rule for
+			 * normal updates to relation files, so that the minimum recovery
+			 * point is always updated before the corresponding change in the data
+			 * file is flushed to disk. We have to do the same manually here.
+			 *
+			 * Doing this before the truncation means that if the truncation fails
+			 * for some reason, you cannot start up the system even after restart,
+			 * until you fix the underlying situation so that the truncation will
+			 * succeed. Alternatively, we could update the minimum recovery point
+			 * after truncation, but that would leave a small window where the
+			 * WAL-first rule could be violated.
+			 */
+			XLogFlush(lsn);
 
-			/* Also tell xlogutils.c about it */
-			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
-		}
+			if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
+			{
+smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
+
+/* Also tell xlogutils.c about it */
+XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			}
 
-		/* Truncate FSM and VM too */
-		rel = CreateFakeRelcacheEntry(xlrec->rnode);
+			/* Truncate FSM and VM too */
+			rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
-		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
-			smgrexists(reln, FSM_FORKNUM))
-			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
-		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
-			smgrexists(reln, VISIBILITYMAP_FORKNUM))
-			visibilitymap_truncate(rel, xlrec->blkno);
+			if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
+smgrexists(reln, FSM_FORKNUM))
+FreeSpaceMapTruncateRel(rel, xlrec->blkno);
+			if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

Alexander Korotkov писал 2017-10-23 13:19:

Despite code cleanup, you still have some random empty lines removals
in your patch.


I reconfigured my IDE to avoid this in the future.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+{
+
+	WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 000..db2f549
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

New little cleanup code changes

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov <i.kartys...@postgrespro.com>
Date:   Mon Oct 23 12:08:59 2017 +0300

Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+{
+
+	WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --g

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

Alexander Korotkov писал 2017-09-26 12:07:

I propose following syntax options.

WAITLSN lsn;
WAITLSN lsn TIMEOUT delay;
WAITLSN lsn INFINITE;
WAITLSN lsn NOWAIT;

For me that looks rather better.  What do you think?


I agree with you, now syntax looks better.
New patch attached to tha mail.

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.


I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov <i.kartys...@postgrespro.com>
Date:   Mon Oct 23 12:08:59 2017 +0300

Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+	

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-09 Thread Ivan Kartyshov

Hello

On 07.03.2017 15:58, Masahiko Sawada wrote:

I've read the discussion so far but I didn't see the reason why you've
changed it to as a contrib module. Could you tell me about that?


I did it on the initiative of our customer, who preferred the 
implementation in the form of contrib. Contrib realization of WAITLSN 
adds to core the only hook.


On 07.03.2017 15:58, Masahiko Sawada wrote:
> I guess this feature would be more useful if provided as a core
> feature and we need to discuss about syntax as Thomas mentioned.

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the 
discretion of the community.


Will be glad to hear your suggestion about syntax, code and patch.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 2bc4d9f..6d5a81e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..338187b
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,134 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait until target LSN has been replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ , delay ]
+WAITLSN_INFINITE 'LSN'
+WAITLSN_NO_WAIT 'LSN'
+
+ 
+
+ 
+  Description
+
+  
+   The WAITLSN wait till target LSN will
+   be replayed with an optional delay (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  
+
+  
+   WAITLSN provides a simple
+   interprocess LSN wait mechanism for a backends on slave
+   in master-slave replication scheme on PostgreSQL database.
+  
+
+  
+   The WAITLSN_INFINITE wait till target LSN will
+   be replayed on slave .
+  
+
+  
+   The WAITLSN_NO_WAIT will tell if target LSN was replayed without any wait.
+  
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Target log sequence number to be wait for.
+ 
+
+   
+   
+delay
+
+ 
+  Time in miliseconds to waiting for LSN to be replayed.
+ 
+
+   
+  
+ 
+
+ 
+  Notes
+
+  
+   Delay time for waiting till LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  
+ 
+
+ 
+  GUCs
+
+  
+ Add two GUCs which help tuning influence on StartupXLOG:
+ count_waitlsn (denominator to check not each LSN)
+ int	count_waitlsn= 10;
+  
+
+  
+ interval_waitlsn (Interval in milliseconds to additional LSN check)
+ int	interval_waitlsn = 100;
+  
+ 
+
+ 
+  Examples
+
+  
+   Configure and execute a waitlsn from
+   psql:
+
+
+WAITLSN '0/3F07A6B1', 1;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index c8191de..a5100a2 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -206,6 +206,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2dcff7f..9c2def1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -145,6 +146,9 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
+/* GUC variable */
+intcount_waitlsn = 10;
+intinterval_waitlsn = 100;
 
 /*
  * Although only "on", "off", and "always" are documented,
@@ -6948,6 +6952,8 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			TimestampTz			time_waitlsn = GetCurrentTimestamp();
+			int	counter_waitlsn = 0;
 
 			InRedo = true;
 
@@ -7174,6 +7180,17 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (counter_waitlsn % count_waitlsn == 0
+	|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+{
+	WaitLSNSetLatch();
+	time_waitlsn = GetCurrentTimestamp();
+}
+counter_waitlsn++;
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL)

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-07 Thread Ivan Kartyshov

Rebase done.

Meanwhile I made some more changes.

Changes
===
1) WAITLSN is now implemented as an extension called "pg_waitlsn"

2) Call new hook "lsn_updated_hook" right after xact_redo_commit (xlog.c)

3) Corresponding functions:
pg_waitlsn('0/693FF800', 1) - wait 10 seconds
pg_waitlsn_infinite('0/693FF800') - for infinite wait
pg_waitlsn_no_wait('0/693FF800') - once check if LSN was replayed or not.

4) Add two GUCs which help tuning influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
int count_waitlsn= 10;

interval_waitlsn (Interval in milliseconds to additional LSN check)
int interval_waitlsn = 100;

5) Optimize loop that set latches.

How to use it
==
Master:
1) Make "wal_level = replica"
Slave:
2) Add  shared_preload_libraries = 'pg_waitlsn'
hot_standby = on (in postgresql.conf)
3) Create extension pg_waitlsn;
4) And in hot_standby you can wait for LSN (pgsleep), when LSN will 
replayed on slave pg_waitlsn will release


select pg_waitlsn(‘LSN’ [, timeout in ms]);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
select pg_waitlsn(‘0/303EC60’, 1);

#Or same without timeout.
select pg_waitlsn(‘0/303EC60’);
select pg_waitlsn_infinite('0/693FF800');

#To check if LSN is replayed can be used.
select pg_waitlsn_no_wait('0/693FF800');

Notice: select pg_waitlsn will release on PostmasterDeath or 
Interruption events if they come earlier then target LSN or timeout.


--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/pg_waitlsn/Makefile b/contrib/pg_waitlsn/Makefile
new file mode 100644
index 000..49a326c
--- /dev/null
+++ b/contrib/pg_waitlsn/Makefile
@@ -0,0 +1,21 @@
+# pg_waitlsn/Makefile
+
+MODULE_big = pg_waitlsn
+OBJS = pg_waitlsn.o
+EXTENSION = pg_waitlsn
+DATA = pg_waitlsn--1.0.sql
+
+
+
+ifdef USE_PGXS
+
+	PG_CONFIG = pg_config
+	PGXS := $( shell $( PG_CONFIG ) --pgxs )
+	include $(PGXS)
+else
+
+	subdir = contrib/pg_waitlsn
+	top_builddir = ../..
+	include $(top_builddir)/src/Makefile.global
+	include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_waitlsn/pg_waitlsn--1.0.sql b/contrib/pg_waitlsn/pg_waitlsn--1.0.sql
new file mode 100644
index 000..8b251f3
--- /dev/null
+++ b/contrib/pg_waitlsn/pg_waitlsn--1.0.sql
@@ -0,0 +1,19 @@
+/* contrib/pg_waitlsn/pg_waitlsn--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_waitlsn" to wait target LSN to been replayed, delay for waiting in miliseconds (default infinity) \quit
+
+CREATE FUNCTION pg_waitlsn(lsn pg_lsn, delay int default 0)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn'
+LANGUAGE C IMMUTABLE STRICT ;
+
+CREATE FUNCTION pg_waitlsn_infinite(lsn pg_lsn)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn_infinite'
+LANGUAGE C IMMUTABLE STRICT ;
+
+CREATE FUNCTION pg_waitlsn_no_wait(lsn pg_lsn)
+RETURNS bool
+AS 'MODULE_PATHNAME', 'pg_waitlsn_no_wait'
+LANGUAGE C IMMUTABLE STRICT ;
diff --git a/contrib/pg_waitlsn/pg_waitlsn.c b/contrib/pg_waitlsn/pg_waitlsn.c
new file mode 100644
index 000..d210678
--- /dev/null
+++ b/contrib/pg_waitlsn/pg_waitlsn.c
@@ -0,0 +1,299 @@
+/*-
+ *
+ * pg_waitlsn
+ *
+ * Portions Copyright (c) 2012-2017, PostgresPro Global Development Group
+ *
+ * IDENTIFICATION
+ *		  contrib/pg_waitlsn/pg_waitlsn.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+#include "fmgr.h"
+#include "pgstat.h"
+#include "access/xlog.h"
+#include "utils/pg_lsn.h"
+#include "storage/latch.h"
+#include "miscadmin.h"
+#include "storage/spin.h"
+#include "storage/backendid.h"
+#include "access/xact.h"
+#include "storage/shmem.h"
+#include "storage/ipc.h"
+#include "utils/timestamp.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "access/transam.h"
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+static bool pg_waitlsn_internal(XLogRecPtr lsn, uint64_t delay);
+
+/* Hooks values */
+static lsn_updated_hook_type prev_lsn_updated_hook = NULL;
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static void wl_lsn_updated_hook(void);
+static uint32 estimate_shmem_size(void);
+
+/* Latches Own-DisownLatch and AbortCаllBack */
+static void disown_latches_on_abort(XactEvent event, void *arg);
+static void wl_own_latch(void);
+static void wl_disown_latch(void);
+
+/* GUC variable */
+intcount_waitlsn = 10;
+intinterval_waitlsn = 100;
+
+/* Globals */
+TimestampTz		time_waitlsn = 0;
+intcounter_waitlsn = 0;
+
+void			_PG_init(void);
+
+/* Shared memory structu

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-01-23 Thread Ivan Kartyshov

Thank you for reviews and suggested improvements.
I rewrote patch to make it more stable.

Changes
===
I've made a few changes:
1) WAITLSN now doesn`t depend on snapshot
2) Check current replayed LSN rather than in xact_redo_commit
3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and
WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you
advised.
4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5
doesn`t exist).
5) Optimize loop that set latches.
6) Add two GUCs that helps us to configure influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
interval_waitlsn (Interval in milliseconds to additional LSN check)

Feedback

On 09/15/2016 05:41 AM, Thomas Munro wrote:

You hold a spinlock in one arbitrary slot, but that
doesn't seem sufficient: another backend may also read it, compute a
new value and then write it, while holding a different spin lock.  Or
am I missing something?


We acquire an individual spinlock on each member of array, so you cannot
compute new value and write it concurrently.

Tested
==
We have been tested it on different servers and OS`s, in different cases 
and workloads. New version is nearly as fast as vanilla on primary and 
bring tiny influence on standby performance.


Hardware:
144 Intel Cores with HT
3TB RAM
all data on ramdisk
primary + hotstandby  on the same node.

A dataset was created with "pgbench -i -s 1000" command. For each round 
of test we pause replay on standby, make 100 transaction on primary 
with pgbench, start replay on standby and measure replication gap 
disappearing time under different standby workload. The workload was 
"WAITLSN ('Very/FarLSN', 1000ms timeout)" followed by "select abalance 
from pgbench_accounts there aid = random_aid;"

For vanilla 1000ms timeout was enforced on pgbench side by -R option.
GUC waitlsn parameters was adopted for 1000ms timeout on standby with 
35000 tps rate on primary.

interval_waitlsn = 500 (ms)
count_waitlsn = 3

On 200 clients, slave caching up master as vanilla without significant 
delay.

On 500 clients, slave caching up master 3% slower then vanilla.
On 1000 clients, 12% slower.
On 5000 clients, 3 time slower because it far above our hardware ability.

How to use it
==
WAITLSN ‘LSN’ [, timeout in ms];
WAITLSN_INFINITE ‘LSN’;
WAITLSN_NO_WAIT ‘LSN’;

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 1;

#Or same without timeout.
WAITLSN ‘0/303EC60’;
orfile:///home/vis/Downloads/waitlsn_10dev_v2.patch
WAITLSN_INFINITE '0/693FF800';

#To check if LSN is replayed can be used.
WAITLSN_NO_WAIT '0/693FF800';

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then target LSN or timeout.


Thank you for reading, will be glad to get your feedback.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bdebd..72c5390695 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1393..3733ad960b 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -200,6 +200,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa9ee5a0dd..9696b5dbb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -143,6 +144,9 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
+/* GUC variable */
+intcount_waitlsn = 10;
+intinterval_waitlsn = 100;
 
 /*
  * Although only "on", "off", and "always" are documented,
@@ -6781,6 +6785,8 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			TimestampTz			time_waitlsn = GetCurrentTimestamp();
+			int	counter_waitlsn = 0;
 
 			InRedo = true;
 
@@ -6998,6 +7004,17 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (counter_waitlsn % count_waitlsn == 0
+	|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+{
+	WaitLSNSetLatch();
+	time_waitlsn = GetCurrentTimestamp();
+}
+counter_waitlsn++;
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} whil

Re: [HACKERS] less expensive pg_buffercache on big shmem

2016-09-28 Thread Ivan Kartyshov

Hello everyone, patch was rebased.

Thank you Tomas for your reviewing this patch and for your valuable 
comments.


From the very beginning we had the misunderstanding with the naming of 
meethods.


> It'd be really useful if you could provide actual numbers, explain what
> metrics you compare and how. I'm sure it makes sense to you but it's
> utterly confusing for everyone else, and it also makes it impossible to
> reproduce the benchmark.

I test it as I wrote earlier, I run it a several times collecting TPS in 
each series, and calculate average value.


> Secondly, I see this bit added to the loop over buffers:
>
> if (bufHdr->tag.forkNum == -1)
> {
> fctx->record[i].blocknum = InvalidBlockNumber;
> continue;
> }
>
> and I have no idea why this is needed (when it was not needed before).

This helps to skip not used bufferpages. It is valuable on big and  not 
warmup shared memory.


On 09/02/2016 12:01 PM, Robert Haas wrote:

I think we certainly want to lock the buffer header, because otherwise
we might get a torn read of the buffer tag, which doesn't seem good.
But it's not obvious to me that there's any point in taking the lock
on the buffer mapping partition; I'm thinking that doesn't really do
anything unless we lock them all, and we all seem to agree that's
going too far.


Replace consistent method with semiconsistent (that lock buffer headers 
without partition lock). Made some additional fixes thanks to reviewers.



--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index 497dbeb..9668d84 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -4,8 +4,9 @@ MODULE_big = pg_buffercache
 OBJS = pg_buffercache_pages.o $(WIN32RES)
 
 EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql \
-	pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
+DATA = pg_buffercache--1.3.sql pg_buffercache--1.2--1.3.sql \
+	pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \
+	pg_buffercache--unpackaged--1.0.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql b/contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql
new file mode 100644
index 000..5935326
--- /dev/null
+++ b/contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql
@@ -0,0 +1,6 @@
+/* contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_buffercache(text) UPDATE TO '1.3'" to load this file. \quit
+
+ALTER FUNCTION pg_buffercache_pages() PARALLEL SAFE;
diff --git a/contrib/pg_buffercache/pg_buffercache--1.2.sql b/contrib/pg_buffercache/pg_buffercache--1.3.sql
similarity index 88%
rename from contrib/pg_buffercache/pg_buffercache--1.2.sql
rename to contrib/pg_buffercache/pg_buffercache--1.3.sql
index 6ee5d84..6b3f69e 100644
--- a/contrib/pg_buffercache/pg_buffercache--1.2.sql
+++ b/contrib/pg_buffercache/pg_buffercache--1.3.sql
@@ -1,4 +1,4 @@
-/* contrib/pg_buffercache/pg_buffercache--1.2.sql */
+/* contrib/pg_buffercache/pg_buffercache--1.3.sql */
 
 -- complain if script is sourced in psql, rather than via CREATE EXTENSION
 \echo Use "CREATE EXTENSION pg_buffercache" to load this file. \quit
@@ -18,4 +18,4 @@ CREATE VIEW pg_buffercache AS
 
 -- Don't want these to be available to public.
 REVOKE ALL ON FUNCTION pg_buffercache_pages() FROM PUBLIC;
-REVOKE ALL ON pg_buffercache FROM PUBLIC;
+REVOKE ALL ON pg_buffercache FROM PUBLIC;
\ No newline at end of file
diff --git a/contrib/pg_buffercache/pg_buffercache.control b/contrib/pg_buffercache/pg_buffercache.control
index a4d664f..8c060ae 100644
--- a/contrib/pg_buffercache/pg_buffercache.control
+++ b/contrib/pg_buffercache/pg_buffercache.control
@@ -1,5 +1,5 @@
 # pg_buffercache extension
 comment = 'examine the shared buffer cache'
-default_version = '1.2'
+default_version = '1.3'
 module_pathname = '$libdir/pg_buffercache'
 relocatable = true
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index da13bde..b6d2efc 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -136,15 +136,6 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 		MemoryContextSwitchTo(oldcontext);
 
 		/*
-		 * To get a consistent picture of the buffer state, we must lock all
-		 * partitions of the buffer map.  Needless to say, this is horrible
-		 * for concurrency.  Must grab locks in increasing order to avoid
-		 * possible deadlocks.
-		 */
-		for (i = 0; i < NUM_BUFFER_PARTITIONS; i++)
-			LWLockAcquire(BufMappingPartitionLockByIndex(i), LW_SHARED);
-
-		/*
 		 * Scan through a

Re: [HACKERS] make async slave to wait for lsn to be replayed

2016-09-06 Thread Ivan Kartyshov

On 08/31/2016 05:54 PM, Craig Ringer wrote:

How do you get the commit LSN to watch for? Grab
pg_current_xlog_insert_location() just after the commit and figure
that replaying to that point guarantees you get the commit?


That's the point, it was created in order to provide the cosistent view 
of data between master and replica. You almost guessed, I used 
GetXLogReplayRecPtr() right after LSN was physically replayed on downstream.



Some time ago[1] I raised the idea of reporting commit LSN on the wire
to clients. That didn't go anywhere due to compatibility and security
concerns. I think those were resolvable, but it wasn't enough of a
priority to push hard on at the time. A truly "right" solution has to
wait for a protocol bump, but I think good-enough solutions are
possible now. So you might want to read that thread.


Thank you for pointing to your thread, it was very informative!
It seems that you have solved the very similar problem.


It also mentions hesitations about exposing LSN to clients even more.
I think we're *way* past that now - we have replication origins and
replication slots relying on it, it's exposed in a pg_lsn datatype, a
bunch of views expose it, etc. But it might be reasonable to ask
"should the client instead be expected to wait for the confirmed
commit of a 64-bit epoch-extended xid, like that returned by
txid_current()?" . One advantage of using xid is that you can get it
while you're still in the xact, so there's no race between commit and
checking the lsn after commit.


That sounds reasonable, but I dont think it will give us any
considerable benefits. But I`ll work out this variant.


Are you specifically trying to ensure "this commit has replayed on the
replica before we run queries on it" ? Or something else?


Yes you are right, I want to ensure data consistency on downstream 
before running queries on it. Our clients would use it as a part of 
background worker and maybe directly in apps too.



---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less expensive pg_buffercache on big shmem

2016-09-05 Thread Ivan Kartyshov

On 09/03/2016 05:04 AM, Tomas Vondra wrote:
> This patch needs a rebase, as 06d7fd6e bumped the version to 1.2.
Thank you for a valuable hint.

> > If we will replace consistent method, then we should replace it 
with the

> > partially consistent method (called "nonconsistent") because:
> > 1) it's based on fast spinlocks (it's not fully up to its name, though)
>
> In other words, it does exactly the thing proposed up-thread, i.e.
> locking only buffer headers.
>
> What do you mean by fast spinlocks? And why aren't they up to the name?

Not they (spinlocks), but the name “nonconsistent” is somewhat 
misleading. At the moment we can't implement concurrent shared memory 
access without locks in general, so most inconsistent method that has 
been proposed was the "nonconsistent" one. But roughly speaking 
*nonconsistent* is not as such by the name, because it contains a 
locking mechanism (spinlocks).


> > 2) it's *almost* the fastest one (the less time needed for execution of
> > method, the less data core will change and as a consequence the more
> > consistent snapshot will be)
>
> I'm not sure I understand what you're saying here? What do you mean by
> "almost the fastest one"?

I mean the fastest one that has been proposed ("consistent"| 
"semiconsistent"| "nonconsistent").


> I'm a bit confused by the results you reported before, i.e. that
>
> 1) nonconsistent is 10% faster than consistent method
> 2) semiconsistent is 5-times slower than nonconsistent method
>
> What does that mean? Are you refering to duration of the queries reading
> data from pg_buffercache, or to impact on the other queries?

Here I mean "working duration time".

> How can be semiconsistent 5x slower than nonconsistent? Wouldn't that
> make it significantly slower than the consistent method?

Firstly, when we want to measure the quality of pg_buffercache, we must 
measure several values:

1) Execution time (duration of the queries reading data from pg_buffercache)
2) How it influences the system (and other queries) during its work

Secondly, the semiconsistent is slower than nonconsistent and consistent 
method, but it makes less influence on other queries then consistent.


Thirdly, it is slower because it locks the partitions of shared memory 
in a different way than in consistent or nonconsistent methods.
The semi-consistent strategy implies that only one buffer is locked at a 
time. Therefor has no significant effect on the system, but it takes 
more time.



---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less expensive pg_buffercache on big shmem

2016-09-02 Thread Ivan Kartyshov

On 09/02/2016 06:01 AM, Robert Haas wrote:

I wonder whether we ought to just switch from the consistent method to
the semiconsistent method and call it good.  I agree with you that
taking every buffer partition lock simultaneously seems like too much
locking.  And in the future if we replace the buffer mapping hash with
something that is lock-free or close to it, then we wouldn't even have
buffer partition locks any more, and probably no way at all to get an
entirely consistent snapshot.
What do you think of this?


I fully agree with you that it would be preferred in the future to 
replace the buffer mapping hash with some of lock-free algorithms.


In the question of replacing the consistent method I agree with you, 
Andres Freund and Peter Geoghegan: the consistent method does not bring 
any considerable benefits.


You might be right regarding the three different modes, but our DBAs 
asked if we could preserve a legacy mode too, thus the choice.


On 09/02/2016 06:19 AM, Andres Freund wrote:
 > +1. I think, before long, we're going to have to switch away from having
 > locks & partitions in the first place. So I don't see a problem relaxing
 > this. It's not like that consistency really buys you anything...  I'd
 > even consider not using any locks.

If we will replace consistent method, then we should replace it with the 
partially consistent method (called "nonconsistent") because:

1) it's based on fast spinlocks (it's not fully up to its name, though)
2) it's *almost* the fastest one (the less time needed for execution of 
method, the less data core will change and as a consequence the more 
consistent snapshot will be)

3) and it has less influence on the entire system during query processing.

On 09/02/2016 06:30 AM, Peter Geoghegan wrote:
 > I would like to be able to run pg_buffercache in production from time
 > to time.

Yes, in our experience the usage of fully consistent pg_buffercache in
production is quite a courageous decision.


---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] less expensive pg_buffercache on big shmem

2016-08-31 Thread Ivan Kartyshov

Hi hackers,

Recently I have finished my work on a patch for pg_buffercache contrib, 
I think it's time to share my results.



Introduction


I want to offer you the implementation that allows to decrease system 
workload by
partially sacrificing (fully snapshot consistency) data consistency. 
Sometimes we do not need full data consistency, for example on 
quantitative rather than qualitative analysis of memory contents, or 
when we want to catch insufficient memory resources or how often 
relation is used.



Problem description
===

Currently, the pg_buffercache v1.1 and prior takes an exclusive lock on 
all shared buffers, which greatly affects system performance.
Usually we use pg_buffercache to find out why DB is working slower than 
expected or examine what occupies the entire memory. So when we run 
pg_buffercache on such system, we make it even slower.



Implementation
==

Vanilla implementation contains loop which collecting statistic from 
whole shared memory acquire, read and release Spinlocks one by one, page 
by page  while holding LWLock.


V1.2 implementation contains flexible loop which can collect shared 
memory statistic using three different methods:
1) with holding LWLock only on one partition of shared memory 
(semiconsistent method)

2) without LWLocks (nonconsistent method),
3) or in vanilia way (consistent method)

The aforementioned allow us to launch pg_buffercache in the three 
different ways.

Each of them have some advantages and some disadvantages:

Consistent:
+ 100% consistency of shared memory snapshot
- Slowdown the system with whole shared memory exclusive lock

Semiconsistent:
+ Faster than consistent method
+ Mostly doesn`t affect on the system load
- Speed of taking that snapshot is low
Nonconsistent:
The fastest
+ Doesn`t noticeably affects on the systems
- <3% lost of snapshot consistency

What works
==

Actually, it work well even on big load, but of course there might be 
things I've

overlooked.
VIEW pg_buffercache_cons
VIEW pg_buffercache_noncons
VIEW pg_buffercache_semicons

Examples from docs in new realization:
SELECT c.relname, count(*) AS buffers FROM pg_buffercache_noncons b 
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND 
b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = 
current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10;


SELECT c.relname, count(*) AS buffers FROM pg_buffercache_semicons b 
INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) AND 
b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = 
current_database())) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10;



Testing the implementation
==

How implementation tested:
1) Start server
2) Make pgbench tps
pgbench -c 250 -s 1000  -T 200 -P1
3) Compare how tps sags under load if:
SELECT count(*) FROM pg_buffercache_cons;
SELECT count(*) FROM pg_buffercache_semicons;
SELECT count(*) FROM pg_buffercache_noncons;

This test was made on server (server parameters)
Model name:Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz
CPU(s):144
Socket(s): 4
Shared_buffers:200GB


Results of testing
==

Our DBA team obtained the following results:
Nonconsistent:
* 10% faster then consistent method
* doesn`t noticeably affects on the systems
	* the maximum loss of accuracy was less then 3%* ( in most situation it 
is permissible accuracy loss )


Semiconsistent:
* 5 time slower then nonconsistent
* made less affects on system compared to consistent

Overall results:
Our clients was pleased with this implementation.
Implementation is made with backward compatibility, as a conclusion old 
pg_buffercache v1.1 queries will work well.
Semiconsistent show results approaching to nonconsistent on SELECTONLY 
queries.


* this values were obtained from our DBA tests.

What can be better
===

It is unclear how to optimize the semiconsistent method to make it 
faster, and reduce temporary effect that appears from time to time.



I will be glad to see your feedback!

---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index 065d3d6..8813c50 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -4,7 +4,7 @@ MODULE_big = pg_buffercache
 OBJS = pg_buffercache_pages.o $(WIN32RES)
 
 EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.1.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
+DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_buffercache/README b/contrib/p

[HACKERS] make async slave to wait for lsn to be replayed

2016-08-31 Thread Ivan Kartyshov

Hi hackers,

Few days earlier I've finished my work on WAITLSN statement utility, so 
I’d like to share it.



Introduction


Our clients who deal with 9.5 and use asynchronous master-slave 
replication, asked to make the wait-mechanism on the slave side to 
prevent the situation when slave handles query which needs data (LSN) 
that was received, flushed, but still not replayed.



Problem description
===

The implementation:
Must handle the wait-mechanism using pg_sleep() in order not to load system
Must avoid race conditions if different backend want to wait for 
different LSN

Must not take snapshot of DB, to avoid troubles with sudden minXID change
Must have optional timeout parameter if LSN traffic has stalled.
Must release on postmaster’s death or interrupts.


Implementation
==

To avoid troubles with snapshots, WAITLSN was implemented as a utility 
statement, this allows us to circumvent the snapshot-taking mechanism.

We tried different variants and the most effective way was to use Latches.
To handle interprocess interaction all Latches are stored in shared 
memory and to cope with race conditions, each Latch is protected by a 
Spinlock.

Timeout was made optional parameter, it is set in milliseconds.


What works
==

Actually, it works well even with significant timeout or wait period 
values, but of course there might be things I've overlooked.


How to use it
==

WAITLSN ‘LSN’ [, timeout in ms];

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 1;

#Or same without timeout.
WAITLSN ‘0/303EC60’;

Notice: WAITLSN will release on PostmasterDeath or Interruption events 
if they come earlier then LSN or timeout.


Testing the implementation
==

The implementation was tested with testgres and unittest python modules.

How to test this implementation:
Start master server
Make table test, insert tuple 1
Make asynchronous slave replication (9.5 wal_level = standby, 9.6 or 
higher wal_level =  replica)

Slave: START TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
SELECT * FROM test;
Master: delete tuple + make vacuum + get new LSN
Slave: WAITLSN ‘newLSN’, 6;
Waitlsn finished with FALSE “LSN doesn`t reached”
Slave: COMMIT;
WAITLSN ‘newLSN’, 6;
Waitlsn finished with success (without NOTICE message)

The WAITLSN as expected wait LSN, and interrupts on PostmasterDeath, 
interrupts or timeout.


Your feedback is welcome!


---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 77667bd..72c5390 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6a8bdca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,108 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait when target LSN been replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ , delay ]
+
+ 
+
+ 
+  Description
+
+  
+   The WAITLSN wait till target LSN will
+   be replayed with an optional delay (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  
+
+  
+   WAITLSN provides a simple
+   interprocess LSN wait mechanism for a backends on slave
+   in master-slave replication scheme on PostgreSQL database.
+  
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Target log sequence number to be wait for.
+ 
+
+   
+   
+delay
+
+ 
+  Time in miliseconds to waiting for LSN to be replayed.
+ 
+
+   
+  
+ 
+
+ 
+  Notes
+
+  
+   Delay time to waiting for LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  
+ 
+
+ 
+  Examples
+
+  
+   Configure and execute a waitlsn from
+   psql:
+
+
+WAITLSN '0/3F07A6B1', 1;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 8acdff1..3733ad9 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -200,6 +200,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..609c83e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_cont

Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Ivan Kartyshov
> Why does it say "tested, failed" for all points above there? ;-)

Hi,  I just used Web reviewer form on https://commitfest.postgresql.org to make 
review on patch, but form doesn't work properly unlike the patch.))  

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread Ivan Kartyshov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Tested, I think it`s rather important to make cleanup work on that project.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Ivan Kartyshov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Applied this patch, it works well, make what it expected correctly, code style 
is maintained. Regression tests passed OK. No documentation or comments.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers