Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
Alexander Korotkov писал 2017-09-05 00:33: I've following comments on this patch: 1) You shouldn't use ">=" to compare xids. You should use TransactionIdFollowsOrEquals() function which handles transaction id wraparound correctly. I fixed it and as an additional I add GUC parameter that could turn off this behavior. These parameter can be set in the postgresql.conf configuration file, or with the help of the ALTER SYSTEM command. For the changes to take effect, call the pg_reload_conf() function or reload the database server. Changes are in vacuum_lazy_truncate_v3.patch 2) As I understood, this patch makes heap truncate only when no concurrent transactions are running on both master and replicas with hot_standby_feedback enabled. For busy system, it would be literally "never do heap truncate after vacuum". Yes, the difficulty is that if we have a lot of replicas and requests for them will overlap each other, then truncation will not occur on the loaded system Perhaps, it's certainly bad idea to disable replaying AccessExclusiveLock *every time*, we should skip taking AccessExclusiveLock only while writing XLOG_SMGR_TRUNCATE record. Besides that, this code violates out coding style. In patch (standby_truncate_skip_v2.patch) fixed this behaviour and now it skip writing XLOG_STANDBY_LOCK records (with in AccessExclusiveLock) only while autovacuum truncations the table. Amit Kapila писал 2017-09-05 12:52: I think one deficiency of this solution is that it will keep on generating extra WAL even if standby doesn't need it (as standby has successfully truncated the relation). I don't know if we can just get away by saying that an additional WAL record per vacuum cycle is harmless especially when that doesn't serve any purpose (like for the cases when standby is always able to truncate the heap on first WAL record). Am I missing some part of solution which avoids it? You are right, and this implementation generating extra WALs and it could have some cases. If you have any solutions to avoid it, I`ll be glade to hear them. Simon Riggs писал 2017-09-05 14:44: If we defer an action on standby replay, when and who will we apply it? What happens if the standby is shutdown or crashes while an action is pending. Perhaps altering the way truncation requires an AccessExclusiveLock would help workloads on both master and standby? If a Relation knew it had a pending truncation then scans wouldn't need to go past newsize. Once older lockers have gone we could simply truncate without the lock. Would need a few pushups but seems less scary then trying to make pending standby actions work well enough to commit. Yes, it is the first idea that come to my mind and the most difficult one. I think that in a few days I'll finish the patch with such ideas of implementation. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 9a5fde0..a6dc369 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -31,6 +31,9 @@ #include "storage/smgr.h" #include "utils/memutils.h" #include "utils/rel.h" +#include "storage/lmgr.h" +#include "storage/procarray.h" +#include "access/transam.h" /* * We keep a list of all relations (represented as RelFileNode values) @@ -495,9 +498,21 @@ smgr_redo(XLogReaderState *record) xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record); SMgrRelation reln; Relation rel; + LOCKTAG locktag; + VirtualTransactionId *backends; + VirtualTransactionId *backends2; reln = smgropen(xlrec->rnode, InvalidBackendId); + SET_LOCKTAG_RELATION(locktag, reln->smgr_rnode.node.dbNode, + reln->smgr_rnode.node.relNode); + + backends = GetLockConflicts(, AccessExclusiveLock); + backends2 = GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid); + + if (!VirtualTransactionIdIsValid(*backends) && !VirtualTransactionIdIsValid(*backends2)) + { + /* * Forcibly create relation if it doesn't exist (which suggests that * it was dropped somewhere later in the WAL sequence). As in @@ -542,6 +557,10 @@ smgr_redo(XLogReaderState *record) visibilitymap_truncate(rel, xlrec->blkno); FreeFakeRelcacheEntry(rel); + } else + { + XLogFlush(lsn); + } } else elog(PANIC, "smgr_redo: unknown op code %u", info); diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 45b1859..3c25907 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -61,6 +61,8 @@ #include "utils/pg_rusage.h" #include "utils/timestamp.h" #include "utils/tqual.h" +#include "catalog/storage_xlog.h" +#include "access/rmgr.h" /* @@ -317,6 +319,32 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, new_rel_pages = vacrelstats->old_rel_pages; new_rel_tuples = vacrelstats->old_rel_tuples; } + else + { + if
Re: [HACKERS] make async slave to wait for lsn to be replayed
I forget to include patch in last letter. Craig Ringer wrote 2017-08-15 05:00: That ship sailed a long time ago unfortunately, they're all over pg_stat_replication and pg_replication_slots and so on. They're already routinely used for monitoring replication lag in bytes, waiting for a peer to catch up, etc. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services Function pg_replication_slots is only master, and waitlsn is async hot standby replication function. It allows us to wait untill insert made on master is be replayed on replica. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index 01acc2e..6792eb0 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory. + diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml new file mode 100644 index 000..077e869 --- /dev/null +++ b/doc/src/sgml/ref/waitlsn.sgml @@ -0,0 +1,119 @@ + + + + + WAITLSN + + + + WAITLSN + 7 + SQL - Language Statements + + + + WAITLSN + wait until target LSN has been replayed + + + + +WAITLSN 'LSN' [ , delay ] +WAITLSN_INFINITE 'LSN' +WAITLSN_NO_WAIT 'LSN' + + + + + Description + + + The WAITLSN wait till target LSN will + be replayed with an optional delay (milliseconds by default + infinity) to be wait for LSN to replayed. + + + + WAITLSN provides a simple + interprocess LSN wait mechanism for a backends on slave + in master-slave replication scheme on PostgreSQL database. + + + + The WAITLSN_INFINITE wait till target LSN will + be replayed on slave . + + + + The WAITLSN_NO_WAIT will tell if target LSN was replayed without any wait. + + + + + Parameters + + + +LSN + + + Target log sequence number to be wait for. + + + + +delay + + + Time in miliseconds to waiting for LSN to be replayed. + + + + + + + + Notes + + + Delay time for waiting till LSN to be replayed must be integer. For + default it is infinity. Waiting can be interupped using Ctl+C, or + by Postmaster death. + + + + + Examples + + + Configure and execute a waitlsn from + psql: + + +WAITLSN '0/3F07A6B1', 1; +NOTICE: LSN is not reached. Try to make bigger delay. +WAITLSN + +WAITLSN '0/3F07A611'; +WAITLSN + +WAITLSN '0/3F0FF791', 50; +^CCancel request sent +NOTICE: LSN is not reached. Try to make bigger delay. +ERROR: canceling statement due to user request + + + + + + Compatibility + + + There is no WAITLSN statement in the SQL + standard. + + + diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index 9000b3a..0c5951a 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -209,6 +209,7 @@ + diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index df4843f..9dfb59d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include "catalog/pg_control.h" #include "catalog/pg_database.h" #include "commands/tablespace.h" +#include "commands/waitlsn.h" #include "miscadmin.h" #include "pgstat.h" #include "port/atomics.h" @@ -147,7 +148,6 @@ const struct config_enum_entry sync_method_options[] = { {NULL, 0, false} }; - /* * Although only "on", "off", and "always" are documented, * we accept all the likely variants of "on" and "off". @@ -7237,6 +7237,15 @@ StartupXLOG(void) break; } +/* + * After update lastReplayedEndRecPtr set Latches in SHMEM array + */ +if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN()) +{ + + WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr); +} + /* Else, try to fetch the next WAL record */ record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false); } while (record != NULL); diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile index 4a6c99e..0d10117 100644 --- a/src/backend/commands/Makefile +++ b/src/backend/commands/Makefile @@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \ policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \ schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \ tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \ - vacuum.o vacuumlazy.o variable.o view.o + vacuum.o vacuumlazy.o variable.o view.o waitlsn.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index bacc08e..7d6011f 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -139,7 +139,6 @@ #include "utils/ps_status.h" #include "utils/timestamp.h" - /*
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hello, thank you for your comments over main idea and code. On 13.03.2017 05:20, Thomas Munro wrote: 1) First, I'll restate my view of the syntax-vs-function question: I think an fmgr function is the wrong place to do this, because it doesn't work for our 2 higher isolation levels as mentioned. Most people probably use READ COMMITTED most of the time so the extension version you've proposed is certainly useful for many people and I like it, but I will vote against inclusion in core of any feature that doesn't consider all three of our isolation levels, especially if there is no way to extend support to other levels later. I don't want PostgreSQL to keep adding features that eventually force everyone to use READ COMMITTED because they want to use feature X, Y or Z. Waiting for LSN is expected to be used on hot standby READ ONLY replication. Only READ COMMITTED and REPEATABLE READ, are allowed on hot standby. Maybe someone can think of a clever way for an extension to insert a wait for a user-supplied LSN *before* acquiring a snapshot so it can work for the higher levels, or maybe the hooks should go into core PostgreSQL so that the extension can exist as an external project not requiring a patched PostgreSQL installation, or maybe this should be done with new core syntax that extends transaction commands. Do other people have views on this? I think it is a good idea, but it is not clear for me, how to do it properly. 2) +wl_lsn_updated_hook(void) +{ +uint i; +/* + * After update lastReplayedEndRecPtr set Latches in SHMEM array + */ +if (counter_waitlsn % count_waitlsn == 0 +|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn)) +{ Doesn't this mean that if you are waiting for LSN 1234, and the primary sends that LSN and then doesn't send anything for another hour, a standby waiting in pg_waitlsn is quite likely to skip that update (either because of count_waitlsn or interval_waitlsn), and then finish up waiting for a very long time? I'm not sure if this is a good idea, but it's an idea: You could keep your update skipping logic, but make sure you always catch the important case where recovery hits the end of WAL, by invoking your callback from WaitForWALToBecomeAvailable. It could have a boolean parameter that means 'don't skip this one!'. In other words, it's OK to skip updates, but not if you know there is no more WAL available to apply (since you have no idea how long it will be for more to arrive). Calling GetCurrentTimestamp() at high frequency (after every WAL record is replayed) may be a bad idea. It's a slow system call on some operating systems. Can we use an existing timestamp for free, like recoveryLastXTime? Remembering that the motivation for using this feature is to wait for *whole transactions* to be replayed and become visible, there is no sensible reason to wait for random WAL positions inside a transaction, so if you used that then you would effectively skip all non-COMMIT records and also skip some COMMIT records that are coming down the pipe too fast. Yes, I applied this idea and prepared new patch. 3) +static void +wl_own_latch(void) +{ +SpinLockAcquire(>l_arr[MyBackendId].slock); +OwnLatch(>l_arr[MyBackendId].latch); +is_latch_owned = true; + +if (state->backend_maxid < MyBackendId) +state->backend_maxid = MyBackendId; + +state->l_arr[MyBackendId].pid = MyProcPid; +SpinLockRelease(>l_arr[MyBackendId].slock); +} What is the point of using extra latches for this? Why not just use the standard latch? Syncrep and many other things do that. I'm not actually sure if there is ever a reason to create more latches in regular backends. SIGUSR1 will be delivered and set the main latch anyway. There are cases of specialised latches in the system, like the wal receiver latch, and I'm reliably informed that that solves problems like delivering a wakeup message without having to know which backend is currently the wal receiver (a problem we might consider solving today with a condition variable?) I don't think anything like that applies here. In my case I create a bunch of shared latches for each backend, I`ll think how to use standard latches in an efficient way. And about specialized latches on standby they are already in busy with wal replaying functions. 4) +for (i = 0; i <= state->backend_maxid; i++) +{ +SpinLockAcquire(>l_arr[i].slock); +if (state->l_arr[i].pid != 0) +SetLatch(>l_arr[i].latch); +SpinLockRelease(>l_arr[i].slock); +} Once we get through the update-skipping logic above, we hit this loop which acquires spinlocks for every backend one after another and sets the latches of every backend, no matter whether they are waiting for the LSN that has been applied. Assuming we go with this scan-the-whole-array approach, why not include the LSN waited for in the array slots,
[HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
Our clients complain about this issue and therefore I want to raise the discussion and suggest several solutions to this problem: I. Why does PG use Fatal when Error is enough to release lock that rises lock conflict? "If (RecoveryConflictPending && DoingCommandRead)" II. Do we really need to truncate the table on hot standby exactly at the same time when truncate on master occurs? In my case conflict happens when the autovacuum truncates table tbl1 on master while backend on replica is performing a long transaction involving the same table tbl1. This happens because truncate takes an AccessExclusiveLock. To tackle this issue we have several options: 1. We can postpone the truncate on the master until all the replicas have finished their transactions (in this case, feedback requests to the master should be sent frequently) Patch 1 vacuum_lazy_truncate.patch 2. Maybe there is an option somehow not to send AccessExclusiveLock and not to truncate table on the replica right away. We could try to wait a little and truncate tbl1 on replica again. Here is a patch that makes replica skip truncate WAL record if some transaction using the same table is already running on replica. And it also forces master to send truncate_wal to replica with actual table length every time autovacuum starts. Patch 2 standby_truncate_skip_v1.patch In this case the transaction which is running for several hours won’t be interrupted because of truncate. Even if for some reason we haven’t truncated this table tbl1 right away, nothing terrible will happen. The next truncate wal record will reduce the file size by the actual length (if some inserts or updates have been performed on master). If you have any ideas or concerns about suggested solution I’ll be glad to hear them. Thanks! -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com Russian Postgres Companydiff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 45b1859..bf73a1b 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -280,7 +280,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, * Optionally truncate the relation. */ if (should_attempt_truncation(vacrelstats)) - lazy_truncate_heap(onerel, vacrelstats); + if (OldestXmin >= ShmemVariableCache->latestCompletedXid) + lazy_truncate_heap(onerel, vacrelstats); /* Report that we are now doing final cleanup */ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 9a5fde0..a6dc369 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -31,6 +31,9 @@ #include "storage/smgr.h" #include "utils/memutils.h" #include "utils/rel.h" +#include "storage/lmgr.h" +#include "storage/procarray.h" +#include "access/transam.h" /* * We keep a list of all relations (represented as RelFileNode values) @@ -495,9 +498,21 @@ smgr_redo(XLogReaderState *record) xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record); SMgrRelation reln; Relation rel; + LOCKTAG locktag; + VirtualTransactionId *backends; + VirtualTransactionId *backends2; reln = smgropen(xlrec->rnode, InvalidBackendId); + SET_LOCKTAG_RELATION(locktag, reln->smgr_rnode.node.dbNode, + reln->smgr_rnode.node.relNode); + + backends = GetLockConflicts(, AccessExclusiveLock); + backends2 = GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid); + + if (!VirtualTransactionIdIsValid(*backends) && !VirtualTransactionIdIsValid(*backends2)) + { + /* * Forcibly create relation if it doesn't exist (which suggests that * it was dropped somewhere later in the WAL sequence). As in @@ -542,6 +557,10 @@ smgr_redo(XLogReaderState *record) visibilitymap_truncate(rel, xlrec->blkno); FreeFakeRelcacheEntry(rel); + } else + { + XLogFlush(lsn); + } } else elog(PANIC, "smgr_redo: unknown op code %u", info); diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 45b1859..fd91db0 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -61,6 +61,8 @@ #include "utils/pg_rusage.h" #include "utils/timestamp.h" #include "utils/tqual.h" +#include "catalog/storage_xlog.h" +#include "access/rmgr.h" /* @@ -317,6 +319,32 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, new_rel_pages = vacrelstats->old_rel_pages; new_rel_tuples = vacrelstats->old_rel_tuples; } + else + { + if (RelationNeedsWAL(onerel)) + { + /* Get the current relation length */ + LockRelationForExtension(onerel, ExclusiveLock); + + /* + * Make an XLOG entry reporting the file truncation. + */ + XLogRecPtr lsn; + xl_smgr_truncate xlrec; + + xlrec.blkno = vacrelstats->rel_pages; + xlrec.rnode = onerel->rd_node; + + XLogBeginInsert(); + XLogRegisterData((char *) , sizeof(xlrec)); + +