Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Wed, Aug 13, 2014 at 12:41:24PM +0900, Fujii Masao wrote: On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Sounds good suggestion. I attached the updated version of the patch. I changed pgstat_report_xx functions like Andres suggested. While reading the patch again, I found it didn't handle the COMMIT/ABORT PREPARED case properly. According to the commit e74e090, now pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED. pg_last_xact_insert_timestamp() is mainly expected to be used to calculate the replication delay, so it also needs to return that timestam. But the patch didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp() into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or RecordTransactionAbortPrepared(). Done. Is this going to be applied? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Aug 11, 2014 at 8:27 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Sounds good suggestion. I attached the updated version of the patch. I changed pgstat_report_xx functions like Andres suggested. While reading the patch again, I found it didn't handle the COMMIT/ABORT PREPARED case properly. According to the commit e74e090, now pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED. pg_last_xact_insert_timestamp() is mainly expected to be used to calculate the replication delay, so it also needs to return that timestam. But the patch didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp() into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or RecordTransactionAbortPrepared(). Done. Regards, -- Fujii Masao *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 16116,16121 SELECT set_config('log_statement_stats', 'off', false); --- 16116,16124 primarypg_current_xlog_location/primary /indexterm indexterm + primarypg_last_xact_insert_timestamp/primary +/indexterm +indexterm primarypg_start_backup/primary /indexterm indexterm *** *** 16180,16185 SELECT set_config('log_statement_stats', 'off', false); --- 16183,16195 /row row entry +literalfunctionpg_last_xact_insert_timestamp()/function/literal +/entry +entrytypetimestamp with time zone/type/entry +entryGet last transaction log insert time stamp/entry + /row + row + entry literalfunctionpg_start_backup(parameterlabel/ typetext/ optional, parameterfast/ typeboolean/ /optional)/function/literal /entry entrytypepg_lsn/type/entry *** *** 16334,16339 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 16344,16356 /para para + functionpg_last_xact_insert_timestamp/ displays the time stamp of last inserted + transaction. This is the time at which the commit or abort WAL record for that transaction. + If there has been no transaction committed or aborted yet since the server has started, + this function returns NULL. +/para + +para For details about proper usage of these functions, see xref linkend=continuous-archiving. /para *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *** *** 862,867 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' --- 862,876 commandps/ command (see xref linkend=monitoring-ps for details). /para para + You can also calculate the lag in time stamp by comparing the last + WAL insert time stamp on the primary with the last WAL replay + time stamp on the standby. They can be retrieved using + functionpg_last_xact_insert_timestamp/ on the primary and + the functionpg_last_xact_replay_timestamp/ on the standby, + respectively (see xref linkend=functions-admin-backup-table and + xref linkend=functions-recovery-info-table for details). + /para + para You can retrieve a list of WAL sender processes via the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view. Large differences between *** a/src/backend/access/transam/twophase.c --- b/src/backend/access/transam/twophase.c *** *** 156,167 static void RecordTransactionCommitPrepared(TransactionId xid, RelFileNode *rels, int ninvalmsgs, SharedInvalidationMessage *invalmsgs, ! bool initfileinval); static void RecordTransactionAbortPrepared(TransactionId xid, int nchildren, TransactionId *children, int nrels, ! RelFileNode *rels); static void ProcessRecords(char *bufptr, TransactionId xid, const TwoPhaseCallback callbacks[]); static void RemoveGXact(GlobalTransaction gxact); --- 156,169 RelFileNode *rels, int ninvalmsgs, SharedInvalidationMessage
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Sounds good suggestion. While reading the patch again, I found it didn't handle the COMMIT/ABORT PREPARED case properly. According to the commit e74e090, now pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED. pg_last_xact_insert_timestamp() is mainly expected to be used to calculate the replication delay, so it also needs to return that timestam. But the patch didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp() into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or RecordTransactionAbortPrepared(). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Dec 12, 2011 at 12:17 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, Dec 10, 2011 at 12:29 PM, Greg Smith g...@2ndquadrant.com wrote: We can send regular special messages from WALSender to WALReceiver that do not form part of the WAL stream, so we don't bulk up WAL archives. (i.e. don't use w messages). Here's my understanding of how this would work. Let me explain a little more and provide a very partial patch. We define a new replication protocol message 'k' which sends a keepalive from primary to standby when there is no WAL to send. The message does not form part of the WAL stream so does not bloat WAL files, nor cause them to fill when unattended. Keepalives contain current end of WAL and a current timestamp. Keepalive processing is all done on the standby and there is no overhead on a primary which does not use replication. There is a slight overhead on primary for keepalives but this happens only when there are no writes. On the standby we already update shared state when we receive some data, so not much else to do there. When the standby has applied up to the end of WAL the replication delay is receipt time - send time of keepalive. Patch introduces regular keepalives from WALsender to WALreceiver, using a new protocol message 'k'. These are sent at intervals of a fraction of replication_delay or 10s if not set, whenever no WAL records have been sent recently. Patch exposes info used for standby_delay calculation as used in 9.0+ In addition introduces direct calculations of replication apply delay and replication transfer latency, both in ms. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index d6332e5..71c40cc 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1467,6 +1467,54 @@ The commands accepted in walsender mode are: variablelist varlistentry term + Primary keepalive message (B) + /term + listitem + para + variablelist + varlistentry + term + Byte1('k') + /term + listitem + para + Identifies the message as a sender keepalive. + /para + /listitem + /varlistentry + varlistentry + term + Byte8 + /term + listitem + para + The current end of WAL on the server, given in + XLogRecPtr format. + /para + /listitem + /varlistentry + varlistentry + term + Byte8 + /term + listitem + para + The server's system clock at the time of transmission, + given in TimestampTz format. + /para + /listitem + /varlistentry + /variablelist + /para + /listitem + /varlistentry + /variablelist + /para + + para + variablelist + varlistentry + term Standby status update (F) /term listitem diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9d96044..77e2760 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -452,6 +452,9 @@ typedef struct XLogCtlData XLogRecPtr recoveryLastRecPtr; /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; + /* timestamp of when we started replaying the current chunk of WAL data, + * only relevant for replication or archive recovery */ + TimestampTz currentChunkStartTime; /* end of the last record restored from the archive */ XLogRecPtr restoreLastRecPtr; /* Are we requested to pause recovery? */ @@ -606,6 +609,7 @@ static void exitArchiveRecovery(TimeLineID endTLI, static bool recoveryStopsHere(XLogRecord *record, bool *includeThis); static void recoveryPausesHere(void); static void SetLatestXTime(TimestampTz xtime); +static void SetCurrentChunkStartTime(TimestampTz xtime); static void CheckRequiredParameterValues(void); static void XLogReportParameters(void); static void LocalSetXLogInsertAllowed(void); @@ -5846,6 +5850,41 @@ GetLatestXTime(void) } /* + * Save timestamp of the next chunk of WAL records to apply. + * + * We keep this in XLogCtl, not a simple static variable, so that it can be + * seen by all backends. + */ +static void +SetCurrentChunkStartTime(TimestampTz xtime) +{ + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(xlogctl-info_lck); + xlogctl-currentChunkStartTime = xtime; + SpinLockRelease(xlogctl-info_lck); +} + +/* + * Fetch timestamp of latest processed commit/abort record. + * Startup process maintains an accurate local copy in XLogReceiptTime + */ +TimestampTz +GetCurrentChunkReplayStartTime(void) +{ + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + TimestampTz xtime; + +
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Sat, Dec 10, 2011 at 12:29 PM, Greg Smith g...@2ndquadrant.com wrote: We can send regular special messages from WALSender to WALReceiver that do not form part of the WAL stream, so we don't bulk up WAL archives. (i.e. don't use w messages). Here's my understanding of how this would work. Let me explain a little more and provide a very partial patch. We define a new replication protocol message 'k' which sends a keepalive from primary to standby when there is no WAL to send. The message does not form part of the WAL stream so does not bloat WAL files, nor cause them to fill when unattended. Keepalives contain current end of WAL and a current timestamp. Keepalive processing is all done on the standby and there is no overhead on a primary which does not use replication. There is a slight overhead on primary for keepalives but this happens only when there are no writes. On the standby we already update shared state when we receive some data, so not much else to do there. When the standby has applied up to the end of WAL the replication delay is receipt time - send time of keepalive. When standby receives a data packet it records WAL ptr and time. As standby applies each chunk it removes the record for each data packet and sets the last applied timestamp. If standby falls behind the number of data packet records will build up, so we begin to keep record every 2 packets, then every 4 packets etc. So the further the standby falls behind the less accurately we record the replication delay - though the accuracy remains proportional to the delay. To complete the patch I need to * send the keepalive messages when no WAL outstanding * receive the messages * store timestamp info for data and keepalives * progressively filter the messages if we get too many I will be working on this patch some more this week. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index d6332e5..71c40cc 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1467,6 +1467,54 @@ The commands accepted in walsender mode are: variablelist varlistentry term + Primary keepalive message (B) + /term + listitem + para + variablelist + varlistentry + term + Byte1('k') + /term + listitem + para + Identifies the message as a sender keepalive. + /para + /listitem + /varlistentry + varlistentry + term + Byte8 + /term + listitem + para + The current end of WAL on the server, given in + XLogRecPtr format. + /para + /listitem + /varlistentry + varlistentry + term + Byte8 + /term + listitem + para + The server's system clock at the time of transmission, + given in TimestampTz format. + /para + /listitem + /varlistentry + /variablelist + /para + /listitem + /varlistentry + /variablelist + /para + + para + variablelist + varlistentry + term Standby status update (F) /term listitem diff --git a/src/include/replication/walprotocol.h b/src/include/replication/walprotocol.h index 656c8fc..1c73d35 100644 --- a/src/include/replication/walprotocol.h +++ b/src/include/replication/walprotocol.h @@ -40,6 +40,21 @@ typedef struct } WalDataMessageHeader; /* + * Keepalive message from primary (message type 'k'). (lowercase k) + * This is wrapped within a CopyData message at the FE/BE protocol level. + * + * Note that the data length is not specified here. + */ +typedef struct +{ + /* Current end of WAL on the sender */ + XLogRecPtr walEnd; + + /* Sender's system clock at the time of transmission */ + TimestampTz sendTime; +} PrimaryKeepaliveMessage; + +/* * Reply message from standby (message type 'r'). This is wrapped within * a CopyData message at the FE/BE protocol level. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Sat, Dec 10, 2011 at 7:29 AM, Greg Smith g...@2ndquadrant.com wrote: -It adds overhead at every commit, even for people who aren't using it. Probably not enough to matter, but it's yet another thing going through the often maligned as too heavy pgstat system, often. The bit about the pgstat system being too heavy is a red herring; this could equally well be stored in the PGPROC. So, the overhead is basically one additional store to shared memory per commit, and we can arrange for that store to involve a cache line that has to be touched anyway (by ProcArrayEndTransaction). I suppose you can make the argument that that still isn't free, but there aren't very many places in the code where we worry about effects this small. Operations like AcceptInvalidationMessages() or LockRelationOid() happen more often than transaction commit, and yet we were happy for years with a system where AcceptInvalidationMessages() did three spinlock acquire-and-release cycles that were unnecessary in 99% of cases. That cost was vastly greater than what we're talking about here, and it affected an operation that is more frequent than this one. [ usability complaints ] Fair enough. I'm normally a fan of building the simplest thing that will do something useful, and this patch succeeds there. But the best data to collect needs to have a timestamp that keeps moving forward in a way that correlates reasonably well to the system WAL load. Using the transaction ID doesn't do that. Simon did some hand waving around sending a timestamp every checkpoint. That would allow the standby to compute its own lag, limit overhead to something much lighter than per-transaction, and better track write volume. There could still be a bigger than normal discontinuity after server restart, if the server was down a while, but at least there wouldn't ever be a point where the value was returned by the master but was NULL. But as Simon mentioned in passing, it will bloat the WAL streams for everyone. But, as with this proposal, the overhead seems largely irrelevant; the question is whether it actually solves the problem. Unless I am misunderstanding, we are talking about 4 bytes of WAL per checkpoint cycle, which strikes me as even less worth worrying about than one store to shared memory per transaction commit. So, personally, I see no reason to fret about the overhead. But I'm skeptical that anything that we only update once per checkpoint cycle will help much in calculating an accurate lag value. It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. archive_timeout 0 works just fine at generating files even when quiet, or if it does not, it is a bug. So I don't understand your comments, please explain. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. archive_timeout 0 works just fine at generating files even when quiet, or if it does not, it is a bug. So I don't understand your comments, please explain. If the standby has restore_command set but not primary_conninfo, then it will never make a direct connection to the master. So anything that's based on extending that protocol won't get used in that case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Dec 12, 2011 at 2:47 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. archive_timeout 0 works just fine at generating files even when quiet, or if it does not, it is a bug. So I don't understand your comments, please explain. If the standby has restore_command set but not primary_conninfo, then it will never make a direct connection to the master. So anything that's based on extending that protocol won't get used in that case. Got that, but now explain the reason for saying such people are out in the cold. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Dec 12, 2011 at 9:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Dec 12, 2011 at 2:47 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 12, 2011 at 9:24 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Dec 12, 2011 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. archive_timeout 0 works just fine at generating files even when quiet, or if it does not, it is a bug. So I don't understand your comments, please explain. If the standby has restore_command set but not primary_conninfo, then it will never make a direct connection to the master. So anything that's based on extending that protocol won't get used in that case. Got that, but now explain the reason for saying such people are out in the cold. By that I only meant that if we add a lag-monitoring feature that relies on the streaming replication protocol, then people who are not using the streaming replication replication protocol will be unable to monitor lag (or at least, the will be unable to do it any better than they can do it today). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On 12/12/2011 08:45 AM, Robert Haas wrote: But I'm skeptical that anything that we only update once per checkpoint cycle will help much in calculating an accurate lag value. I'm sure there is no upper bound on how much WAL lag you can build up between commit/abort records either; they can be far less frequent than checkpoints. All it takes is a multi-hour COPY with no other commits to completely hose lag measured by that advance, and that is not an unusual situation at all. Overnight daily ETL or reporting MV-ish roll-ups, scheduled specifically for when no one is normally at the office, are the first thing that spring to mind. Anyway, I wasn't suggesting checkpoints as anything other than a worst case behavior. We can always thump out more frequent updates to reduce lag, and in what I expect to the most common case the WAL send/receive stuff will usually do much better. I see the XID vs. WAL position UI issues as being fundamentally unsolvable, which really bothers me. I'd have preferred to run screaming away from this thread if it hadn't. It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. There's a number of reasons why we might want more timestamps streamed into the WAL; this might be one. We'd just need one to pop out one as part of the archive_timeout switch to in theory make it possible for these people to be happy. I think Simon was hoping to avoid WAL timestamps, I wouldn't bet too much on that myself. The obvious implementation problem here is that the logical place to put the timestamps is right at the end of the WAL file, just before it's closed for archiving. But that position isn't known until you've at least started processing it, which you clearly are not doing fast enough if lag exists. As far as who's still important here, two observations. Note that the pg_last_xact_insert_timestamp approach can fail to satisfy WAL shipping people who are going to a separate network, where it's impractical to connect to both servers with libpq. I have some customers who like putting a one-way WAL wall (sorry) between production and the standby server, with the log shipping being the only route between them; that's one reason why they might still be doing this instead of using streaming. There's really no good way to make these people happy and provide time lag monitoring inside the database. I was actually the last person I recall who suggested some extra monitoring mainly aimed at WAL shipping environments: http://archives.postgresql.org/pgsql-hackers/2010-01/msg01522.php Had some pg_standby changes I was also working on back then, almost two years ago. I never circled back to any of it due to having zero demand since 9.0 shipped, the requests I had been regularly getting about this all dried up. While I'm all for keeping new features working for everyone when it doesn't hold progress back, it's not unreasonable to recognize we can't support every monitoring option through all of the weird ways WAL files can move around. pg_stat_replication isn't very helpful for 9.0+ WAL shippers either, yet they still go on doing their thing. In the other direction, people who will immediately adopt the latest hotness, cascading is a whole new layer of use case concerns on top of the ones considered so far. Now you're talking two layers of connections users have to navigate though to compute master-cascaded standby lag. Cascade the WALSender timestamps instead, which seems pretty simple to do, and then people can just ask their local standby. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On 10/02/2011 07:10 AM, Robert Haas wrote: Your proposals involve sending additional information from the master to the slave, but the slave already knows both its WAL position and the timestamp of the transaction it has most recently replayed, because the startup process on the slave tracks that information and publishes it in shared memory. On the master, however, only the WAL position is centrally tracked; the transaction timestamps are not. This seems to be the question that was never really answered well enough to satisfy anyone, so let's rewind to here for a bit. I wasn't following this closely until now, so I just did my own review from scratch against the latest patch. I found a few issues, and I don't think all of them have been vented here yet: -It adds overhead at every commit, even for people who aren't using it. Probably not enough to matter, but it's yet another thing going through the often maligned as too heavy pgstat system, often. -In order to measure lag this way, you need access to both the master and the standby. Yuck, dblink or application code doing timestamp math, either idea makes me shudder. It would be nice to answer how many seconds of lag do have? directly from the standby. Ideally I would like both the master and standby to know those numbers. -After the server is restarted, you get a hole in the monitoring data until the first transaction is committed or aborted. The way the existing pg_last_xact_replay_timestamp side of this computation goes NULL for some unpredictable period after restart is going to drive monitoring systems batty. Building this new facility similarly will force everyone who writes a lag monitor to special case for that condition on both sides. Sure, that's less user hostile than the status quo, but it isn't going to help PostgreSQL's battered reputation in the area of monitoring either. -The transaction ID advancing is not a very good proxy for real-world lag. You can have a very large amount of writing in between commits. The existing lag monitoring possibilities focus on WAL position instead, which is better correlated with the sort of activity that causes lag. Making one measurement of lag WAL position based (the existing ones) and another based on transaction advance (this proposal) is bound to raise some which of these is the correct lag? questions, when the two diverge. Large data loading operations come to mind as a not unusual at all situation where this would happen. I'm normally a fan of building the simplest thing that will do something useful, and this patch succeeds there. But the best data to collect needs to have a timestamp that keeps moving forward in a way that correlates reasonably well to the system WAL load. Using the transaction ID doesn't do that. Simon did some hand waving around sending a timestamp every checkpoint. That would allow the standby to compute its own lag, limit overhead to something much lighter than per-transaction, and better track write volume. There could still be a bigger than normal discontinuity after server restart, if the server was down a while, but at least there wouldn't ever be a point where the value was returned by the master but was NULL. But as Simon mentioned in passing, it will bloat the WAL streams for everyone. Here's the as yet uncoded mini-proposal that seems to have slid by uncommented upon: We can send regular special messages from WALSender to WALReceiver that do not form part of the WAL stream, so we don't bulk up WAL archives. (i.e. don't use w messages). Here's my understanding of how this would work. Each time a commit/abort record appears in the WAL, that updates XLogCtl with the associated timestamp. If WALReceiver received regular updates containing the master's clock timestamp and stored them similarly--either via regular streaming or the heartbeat--it could compute lag with the same resolution as this patch aims to do, for the price of two spinlocks: just subtract the two timestamps. No overhead on the master, and lag can be directly computed and queried from each standby. If you want to feed that data back to the master so it can appear in pg_stat_replication in both places, send it periodically via the same channel sync rep and standby feedback use. I believe that will be cheap in many cases, since it can piggyback on messages that will still be quite small relative to minimum packet size on most networks. (Exception for compressed/encrypted networks where packets aren't discrete in this way doesn't seem that relevant, presuming that if you're doing one of those I would think this overhead is the least of your worries) Now, there's still one problem here. This doesn't address the lots of write volume but no commit problem any better than the proposed patch does. Maybe there's some fancy way to inject it along with the received WAL on the standby, I'm out of brain power
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Thu, Oct 13, 2011 at 14:25, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 4, 2011 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote: Simon, could you? If your proposal turns out to be better than mine, I'd be happy to agree to drop my patch and adopt yours. Yes, will do. Simon, I believe that we are still waiting for this. Are we going to hear anything back on this one for the current CF? If not, I suggest we go with Fujiis version for now - we can always change it for a potentially better version later. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Tue, Oct 4, 2011 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote: Simon, could you? If your proposal turns out to be better than mine, I'd be happy to agree to drop my patch and adopt yours. Yes, will do. Simon, I believe that we are still waiting for this. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Oct 3, 2011 at 6:31 PM, Fujii Masao masao.fu...@gmail.com wrote: Also, in pg_last_xact_insert_timestamp, the errhint() seems a little strange - this is not exactly a WAL *control* function, is it? Not only control but also WAL might be confusing. What about transaction information functions? Attached is the updated version of the patch. In the patch, I used the function name itself in the HINT message, i.e., the HINT message is the following. pg_last_xact_insert_timestamp() cannot be executed during recovery. In the documentation, for the short description of pg_last_xact_insert_timestamp(), how about something like returns the time at which a transaction commit or transaction about record was last inserted into the transaction log? Or maybe that's too long. But the current description doesn't seem to do much other than recapitulate the function name, so I'm wondering if we can do any better than that. Agreed. I will change the description per your suggestion. Done. I think that instead of hacking up the backend-status copying code to have a mode where it copies everything, you should just have a special-purpose function that computes the value you need directly off the backend status entries themselves. This approach seems like it both clutters the code and adds lots of extra data copying. Agreed. Will change. Done. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 13996,14001 SELECT set_config('log_statement_stats', 'off', false); --- 13996,14004 primarypg_current_xlog_location/primary /indexterm indexterm + primarypg_last_xact_insert_timestamp/primary +/indexterm +indexterm primarypg_start_backup/primary /indexterm indexterm *** *** 14049,14054 SELECT set_config('log_statement_stats', 'off', false); --- 14052,14067 /row row entry + literalfunctionpg_last_xact_insert_timestamp()/function/literal + /entry +entrytypetimestamp with time zone/type/entry +entry + Get the time at which a transaction commit or transaction abort record + was last inserted into the transaction log + /entry + /row + row +entry literalfunctionpg_start_backup(parameterlabel/ typetext/ optional, parameterfast/ typeboolean/ /optional)/function/literal /entry entrytypetext/type/entry *** *** 14175,14180 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 14188,14200 /para para + functionpg_last_xact_insert_timestamp/ displays the time stamp of last inserted + transaction. This is the time at which the commit or abort WAL record for that transaction. + If there has been no transaction committed or aborted yet since the server has started, + this function returns NULL. +/para + +para For details about proper usage of these functions, see xref linkend=continuous-archiving. /para *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *** *** 867,872 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' --- 867,881 commandps/ command (see xref linkend=monitoring-ps for details). /para para + You can also calculate the lag in time stamp by comparing the last + WAL insert time stamp on the primary with the last WAL replay + time stamp on the standby. They can be retrieved using + functionpg_last_xact_insert_timestamp/ on the primary and + the functionpg_last_xact_replay_timestamp/ on the standby, + respectively (see xref linkend=functions-admin-backup-table and + xref linkend=functions-recovery-info-table for details). + /para + para You can retrieve a list of WAL sender processes via the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view. Large differences between *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } /* *** *** 1434,1439 RecordTransactionAbort(bool isSubXact) --- 1437,1445 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_ABORT, rdata); + /* Save timestamp of latest transaction abort record */ + pgstat_report_xact_end_timestamp(xlrec.xact_time); + /* * Report the latest async abort LSN, so that the WAL writer knows to * flush this abort. There's nothing to be gained by delaying this, since *** *** 4968,4970
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Tue, Oct 4, 2011 at 5:33 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas robertmh...@gmail.com wrote: Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. Arguing between trenches doesn't really get us anywhere. As ever, when someone claims to have a better solution then it is up to them to prove that is the case. So... are you going to do that? Simon, could you? If your proposal turns out to be better than mine, I'd be happy to agree to drop my patch and adopt yours. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Tue, Oct 4, 2011 at 1:05 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 4, 2011 at 5:33 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas robertmh...@gmail.com wrote: Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. Arguing between trenches doesn't really get us anywhere. As ever, when someone claims to have a better solution then it is up to them to prove that is the case. So... are you going to do that? Simon, could you? If your proposal turns out to be better than mine, I'd be happy to agree to drop my patch and adopt yours. Yes, will do. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Sun, Oct 2, 2011 at 8:19 PM, Robert Haas robertmh...@gmail.com wrote: It occurs to me that pgstat_report_xact_end_timestamp doesn't really need to follow the protocol of bumping the change count before and after bumping the timestamp. We elsewhere assume that four-byte reads and writes are atomic, so there's no harm in assuming the same thing here (and if they're not... then the change-count thing is pretty dubious anyway). I think it's sufficient to just set the value, full stop. I agree with Tom here. It seems to be safer to follow the protocol even if that's not required for now. Also, in pg_last_xact_insert_timestamp, the errhint() seems a little strange - this is not exactly a WAL *control* function, is it? Not only control but also WAL might be confusing. What about transaction information functions? BTW, pg_current_xlog_location() and pg_current_xlog_insert_location() use the same HINT message as I used for pg_last_xact_insert_timestamp(), but they are also not WAL *control* function. And, in the document, they are categorized as Backup Control Functions, but which sounds also strange. We should call them WAL information functions in both HINT message and the document? In the documentation, for the short description of pg_last_xact_insert_timestamp(), how about something like returns the time at which a transaction commit or transaction about record was last inserted into the transaction log? Or maybe that's too long. But the current description doesn't seem to do much other than recapitulate the function name, so I'm wondering if we can do any better than that. Agreed. I will change the description per your suggestion. I think that instead of hacking up the backend-status copying code to have a mode where it copies everything, you should just have a special-purpose function that computes the value you need directly off the backend status entries themselves. This approach seems like it both clutters the code and adds lots of extra data copying. Agreed. Will change. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 4:18 PM, Simon Riggs si...@2ndquadrant.com wrote: If we want to measure times, we can easily send regular messages into WAL to provide this function. Using checkpoint records would seem frequent enough to me. We don't always send checkpoint records but we can send an info message instead if we are streaming. If archive_timeout is not set and max_wal_senders 0 then we can send an info WAL message with timestamp on a regular cycle. What timestamp are you thinking the walsender should send? What we need is the timestamp which is comparable with the result of pg_last_xact_replay_timestamp() which returns the timestamp of the transaction commit or abort record. So, even if we adopt your proposal, ISTM that we still need to collect the timestamp for each commit. No? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Sun, Oct 2, 2011 at 8:21 AM, Simon Riggs si...@2ndquadrant.com wrote: The problem is to find the replication delay, even when the system is quiet. What I have proposed finds the replication delay more accurately even than looking at the last commit, since often there are writes but no commits. If we focus on the problem, rather than the first suggested solution to that problem, we'll come out on top. Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. You said maybe we could WAL log something once per checkpoint cycle or maybe we could add a new protocol message. We've both replied with various emails saying we don't understand how that would solve the problem. If you want to add some detail to your proposal, then we can weigh the pros and cons as compared with what the patch does - but right now all you've provided is a theory that there might be a better solution to this problem out there, not any details about how it would work. Or if you have, then please post a link to the message where those details are written out, because I cannot find them on the thread. I do, however, agree that that the case where the system is quiet is the problem case for computing replication delay. It seems to me that, even without this patch, if the system has a continuous stream of commits, you can easily find the replication delay by differencing the current time on the master with the value returned by pg_last_xact_replay_timestamp(). However, if the master goes quiet, then the slave will appear to be progressively farther behind. With the addition of this patch, that problem goes away: you can now difference the return value of pg_last_xact_insert_timestamp() on the master with the return value of pg_last_xact_replay_timestamp() on the slave. If the master goes quiet, then pg_last_xact_insert_timestamp() will stop advancing, and so the two values you are comparing will be equal once the slave has caught up, and remain equal until activity resumes on the master. Now, there is a more subtle remaining problem, which is that when activity *does* resume on the master, there will be a (probably quite short) window of time during which the slave will have a much earlier timestamp than the one on the master. When the master has a commit after a long idle period but the slave has not yet replayed the commit record, the replication delay will appear to be equal to the length of the idle period. But that doesn't seem like a sufficient reason to reject the whole approach, because there are several ways around it. First, you could simply decide that the large computed lag value, although counterintuitive, is accurate under some definition, because, well, that really is the lag between the last transaction committed on the master and the last transaction committed on the standby, and if you don't like the fact that timestamps behave that way, you should compare using WAL positions instead. If you don't like that approach, then a second, also viable approach is to teach your monitoring software that the replication delay can never increase faster than the rate at which clock time is passing. So if you were caught up a minute ago, then you can't be more than a minute behind now. Another point I want to make here is that there's probably more than one useful definition of replication delay. The previous question presupposes that you're trying to answer the question if I have a transaction that committed N seconds ago on the master, will it be visible on the standby?. It's also a reasonable time-based substitute for measuring the difference in master and standby WAL positions, although certainly it's going to work better if the rate of WAL generation is relatively even. But for a lot of people, it may be that what they really want to know is what is the expected time for the standby to replay all generated but not yet applied WAL? - or maybe some third thing that I'm not thinking of - and this function won't provide that. I think we can ultimately afford to provide more than one mechanism here, so I don't see doing this as foreclosing any other also-useful calculation that someone may wish to add in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas robertmh...@gmail.com wrote: Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. Arguing between trenches doesn't really get us anywhere. As ever, when someone claims to have a better solution then it is up to them to prove that is the case. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Oct 3, 2011 at 4:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Oct 3, 2011 at 8:07 PM, Robert Haas robertmh...@gmail.com wrote: Sorry, but I still don't really think it's fair to say that you've proposed a solution to this problem. Or if you have, neither I nor Fujii Masao understand that proposal well enough to decide whether we like it. Arguing between trenches doesn't really get us anywhere. As ever, when someone claims to have a better solution then it is up to them to prove that is the case. So... are you going to do that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 4:07 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote: If the feature could not be done another way, easily, I might agree. I don't see that you've offered a reasonable alternative. The alternative proposals that you proposed don't appear to me to be solving the same problem. AIUI, the requested feature is to be able to get, on the master, the timestamp of the last commit or abort, just as we can already get the timestamp of the last commit or abort replayed on the standby. Nothing you WAL log and no messages you send to the standby are going to accomplish that goal. The goal of the OP was to find out the replication delay. This function was suggested, but its not the only way. My alternative proposals solve the original problem in a better way. As far as I can see, they don't solve the problem at all. Your proposals involve sending additional information from the master to the slave, but the slave already knows both its WAL position and the timestamp of the transaction it has most recently replayed, because the startup process on the slave tracks that information and publishes it in shared memory. On the master, however, only the WAL position is centrally tracked; the transaction timestamps are not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Thu, Sep 15, 2011 at 4:52 AM, Fujii Masao masao.fu...@gmail.com wrote: Okay, I've changed the patch in that way. It occurs to me that pgstat_report_xact_end_timestamp doesn't really need to follow the protocol of bumping the change count before and after bumping the timestamp. We elsewhere assume that four-byte reads and writes are atomic, so there's no harm in assuming the same thing here (and if they're not... then the change-count thing is pretty dubious anyway). I think it's sufficient to just set the value, full stop. Also, in pg_last_xact_insert_timestamp, the errhint() seems a little strange - this is not exactly a WAL *control* function, is it? In the documentation, for the short description of pg_last_xact_insert_timestamp(), how about something like returns the time at which a transaction commit or transaction about record was last inserted into the transaction log? Or maybe that's too long. But the current description doesn't seem to do much other than recapitulate the function name, so I'm wondering if we can do any better than that. I think that instead of hacking up the backend-status copying code to have a mode where it copies everything, you should just have a special-purpose function that computes the value you need directly off the backend status entries themselves. This approach seems like it both clutters the code and adds lots of extra data copying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Sun, Oct 2, 2011 at 12:10 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 30, 2011 at 4:07 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote: If the feature could not be done another way, easily, I might agree. I don't see that you've offered a reasonable alternative. The alternative proposals that you proposed don't appear to me to be solving the same problem. AIUI, the requested feature is to be able to get, on the master, the timestamp of the last commit or abort, just as we can already get the timestamp of the last commit or abort replayed on the standby. Nothing you WAL log and no messages you send to the standby are going to accomplish that goal. The goal of the OP was to find out the replication delay. This function was suggested, but its not the only way. My alternative proposals solve the original problem in a better way. As far as I can see, they don't solve the problem at all. Your proposals involve sending additional information from the master to the slave, but the slave already knows both its WAL position and the timestamp of the transaction it has most recently replayed, because the startup process on the slave tracks that information and publishes it in shared memory. On the master, however, only the WAL position is centrally tracked; the transaction timestamps are not. The problem is to find the replication delay, even when the system is quiet. What I have proposed finds the replication delay more accurately even than looking at the last commit, since often there are writes but no commits. If we focus on the problem, rather than the first suggested solution to that problem, we'll come out on top. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
Robert Haas robertmh...@gmail.com writes: It occurs to me that pgstat_report_xact_end_timestamp doesn't really need to follow the protocol of bumping the change count before and after bumping the timestamp. We elsewhere assume that four-byte reads and writes are atomic, so there's no harm in assuming the same thing here (and if they're not... then the change-count thing is pretty dubious anyway). I think it's sufficient to just set the value, full stop. I agree you can read the value without that, but I think that setting it should still bump the change count. Otherwise there's no way for another process to collect the whole struct and be sure that it's self-consistent. We may not have a critical need for that right now, but it's silly to foreclose the possibility to save a cycle or so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Ok, I send this patch to comitters. I repeat my objection to this patch. I'm very sorry I haven't been around much in last few weeks to keep up a dialogue about this and to make it clear how wrong I think this is. Adding something onto the main path of transaction commit is not good, especially when the only purpose of it is to run an occasional monitoring query for those people that use replication. Not everybody uses replication and even people that do use it don't need to run it so frequently that every single commit needs this. This is just bloat that we do not need and can also easily avoid. The calculation itself would be problematic since it differs from the way standby_delay is calculated on the standby, which will cause much confusion. Some thought or comment should be made about that also. If we want to measure times, we can easily send regular messages into WAL to provide this function. Using checkpoint records would seem frequent enough to me. We don't always send checkpoint records but we can send an info message instead if we are streaming. If archive_timeout is not set and max_wal_senders 0 then we can send an info WAL message with timestamp on a regular cycle. Alternatively, we can send regular special messages from WALSender to WALReceiver that do not form part of the WAL stream, so we don't bulk up WAL archives. (i.e. don't use w messages). That seems like the most viable approach to this problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 3:18 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Ok, I send this patch to comitters. I repeat my objection to this patch. I'm very sorry I haven't been around much in last few weeks to keep up a dialogue about this and to make it clear how wrong I think this is. Adding something onto the main path of transaction commit is not good, especially when the only purpose of it is to run an occasional monitoring query for those people that use replication. Not everybody uses replication and even people that do use it don't need to run it so frequently that every single commit needs this. This is just bloat that we do not need and can also easily avoid. I think the overhead of this is so completely trivial that we shouldn't be concerned about it. It's writing 12 bytes to shared memory for each commit, without taking a lock, in a cache line that should be minimally contended. We write plenty of other data to shared memory on each commit, and that's nowhere close to being the expensive part of committing a transaction. What's expensive is fighting over WALInsertLock and ProcArrayLock and CLogControlLock, and possibly waiting for the commit to be durably flushed to disk, and maybe (at the margin) wrenching the cache lines in our PGPROC away from whatever processor last stole them to do GetSnapshotData(). I don't think that a couple of stores to uncontended shared memory are going to make any difference. The calculation itself would be problematic since it differs from the way standby_delay is calculated on the standby, which will cause much confusion. Some thought or comment should be made about that also. The string standby_delay doesn't appear in our source tree anywhere, so I'm not sure what this is referring to. In any case, I'm in favor of this feature. Currently, the only way to measure the lag on the standby is to measure it in WAL bytes - and you get to write your own script to parse the WAL positions. This will allow people to measure it in minutes. That seems like a significant usability improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 8:11 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 30, 2011 at 3:18 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Sep 30, 2011 at 3:24 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Ok, I send this patch to comitters. I repeat my objection to this patch. I'm very sorry I haven't been around much in last few weeks to keep up a dialogue about this and to make it clear how wrong I think this is. Adding something onto the main path of transaction commit is not good, especially when the only purpose of it is to run an occasional monitoring query for those people that use replication. Not everybody uses replication and even people that do use it don't need to run it so frequently that every single commit needs this. This is just bloat that we do not need and can also easily avoid. I think the overhead of this is so completely trivial that we shouldn't be concerned about it. It's writing 12 bytes to shared memory for each commit, without taking a lock, in a cache line that should be minimally contended. We write plenty of other data to shared memory on each commit, and that's nowhere close to being the expensive part of committing a transaction. What's expensive is fighting over WALInsertLock and ProcArrayLock and CLogControlLock, and possibly waiting for the commit to be durably flushed to disk, and maybe (at the margin) wrenching the cache lines in our PGPROC away from whatever processor last stole them to do GetSnapshotData(). I don't think that a couple of stores to uncontended shared memory are going to make any difference. If the feature could not be done another way, easily, I might agree. The point is it isn't necessary, useful or elegant to do it this way and *any* cycles added to mainline transactions need to have careful justification. And there really isn't any justification for doing things this way other than its the first way somebody thought of. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote: If the feature could not be done another way, easily, I might agree. I don't see that you've offered a reasonable alternative. The alternative proposals that you proposed don't appear to me to be solving the same problem. AIUI, the requested feature is to be able to get, on the master, the timestamp of the last commit or abort, just as we can already get the timestamp of the last commit or abort replayed on the standby. Nothing you WAL log and no messages you send to the standby are going to accomplish that goal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Fri, Sep 30, 2011 at 8:57 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 30, 2011 at 3:22 PM, Simon Riggs si...@2ndquadrant.com wrote: If the feature could not be done another way, easily, I might agree. I don't see that you've offered a reasonable alternative. The alternative proposals that you proposed don't appear to me to be solving the same problem. AIUI, the requested feature is to be able to get, on the master, the timestamp of the last commit or abort, just as we can already get the timestamp of the last commit or abort replayed on the standby. Nothing you WAL log and no messages you send to the standby are going to accomplish that goal. The goal of the OP was to find out the replication delay. This function was suggested, but its not the only way. My alternative proposals solve the original problem in a better way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
Sorry for late to re-review. One question is remaind, Q1: The shmem entry for timestamp is not initialized on allocating. Is this OK? (I don't know that for OSs other than Linux) And zeroing double field is OK for all OSs? CreateSharedBackendStatus() initializes that shmem entries by doing MemSet(BackendStatusArray, 0, size). You think this is not enough? Sorry for missing it. That's enough. Nevertheless this is ok for all OSs, I don't know whether initializing TimestampTz(double, int64 is ok) field with 8 bytes zeros is OK or not, for all platforms. (It is ok for IEEE754-binary64). Which code are you concerned about? xlog.c: 5889 beentry = pgstat_fetch_all_beentry(); for (i = 0; i MaxBackends; i++, beentry++) { xtime = beentry-st_xact_end_timestamp; I think the last line in quoted code above reads possibly zero-initialized double (or int64) value, then the doubted will be compared and copied to another double. if (result xtime) result = xtime; No, st_changecount is used to read st_xact_end_timestamp. st_xact_end_timestamp is read from the shmem to the local memory in pgstat_read_current_status(), and this function always checks st_changecount when reading the shmem value. Yes, I confirmed that pg_lats_xact_insert_timestamp looks local copy of BackendStatus. I've found it not unnecessary. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Thu, Sep 29, 2011 at 5:20 PM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Sorry for late to re-review. Thanks! Nevertheless this is ok for all OSs, I don't know whether initializing TimestampTz(double, int64 is ok) field with 8 bytes zeros is OK or not, for all platforms. (It is ok for IEEE754-binary64). Which code are you concerned about? xlog.c: 5889 beentry = pgstat_fetch_all_beentry(); for (i = 0; i MaxBackends; i++, beentry++) { xtime = beentry-st_xact_end_timestamp; I think the last line in quoted code above reads possibly zero-initialized double (or int64) value, then the doubted will be compared and copied to another double. if (result xtime) result = xtime; I believe it's safe. Such a code is placed elsewhere in the source, too. If it's unsafe, we should have received lots of bug reports related to that. But we've not. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
Hello, I understand that it has been at least practically no problem. Ok, I send this patch to comitters. Thanks for your dealing with nuisance questions. At Thu, 29 Sep 2011 21:21:32 +0900, Fujii Masao masao.fu...@gmail.com wrote in cahgqgwg0c21f0czy5exx-49dxdx7hjuneibbj0tzvh+7vmx...@mail.gmail.com Nevertheless this is ok for all OSs, I don't know whether initializing TimestampTz(double, int64 is ok) field with 8 bytes ... I believe it's safe. Such a code is placed elsewhere in the source, too. If it's unsafe, we should have received lots of bug reports related to that. But we've not. Regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Thu, Sep 15, 2011 at 4:52 AM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for the review! Koyotaro Horiguchi - Are you going to re-review the latest version of this patch? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Wed, Sep 14, 2011 at 6:21 PM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Hi, This is a review for pg_last_xact_insert_timestamp patch. (https://commitfest.postgresql.org/action/patch_view?id=634) Thanks for the review! Q1: The shmem entry for timestamp is not initialized on allocating. Is this OK? (I don't know that for OSs other than Linux) And zeroing double field is OK for all OSs? CreateSharedBackendStatus() initializes that shmem entries by doing MemSet(BackendStatusArray, 0, size). You think this is not enough? Nevertheless this is ok for all OSs, I don't know whether initializing TimestampTz(double, int64 is ok) field with 8 bytes zeros is OK or not, for all platforms. (It is ok for IEEE754-binary64). Which code are you concerned about? == Modification detection protocol in pgstat.c In pgstat_report_xact_end_timestamp, `beentry-st_changecount protocol' is used. It is for avoiding reading halfway-updated beentry as described in pgstat.h. Meanwhile, beentry-st_xact_end_timestamp is not read or (re-)initialized in pgstat.c and xlog.c reads only this field of whole beentry and st_changecount is not get cared here.. No, st_changecount is used to read st_xact_end_timestamp. st_xact_end_timestamp is read from the shmem to the local memory in pgstat_read_current_status(), and this function always checks st_changecount when reading the shmem value. == Code duplication in xact.c in xact.c, same lines inserted into the end of both IF and ELSE blocks. xact.c:1047 pgstat_report_xact_end_timestamp(xlrec.xact_time); xact.c:1073 pgstat_report_xact_end_timestamp(xlrec.xact_time); These two lines refer to xlrec.xact_time, both of which comes from xactStopTimestamp freezed at xact.c:986 xact.c:986 SetCurrentTransactionStopTimestamp(); xact.c:1008 xlrec.xact_time = xactStopTimestamp; xact.c:1051 xlrec.xact_time = xactStopTimestamp; I think it is better to move this line to just after this ELSE block using xactStopTimestamp instead of xlrec.xact_time. Okay, I've changed the patch in that way. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 13996,14001 SELECT set_config('log_statement_stats', 'off', false); --- 13996,14004 primarypg_current_xlog_location/primary /indexterm indexterm + primarypg_last_xact_insert_timestamp/primary +/indexterm +indexterm primarypg_start_backup/primary /indexterm indexterm *** *** 14049,14054 SELECT set_config('log_statement_stats', 'off', false); --- 14052,14064 /row row entry + literalfunctionpg_last_xact_insert_timestamp()/function/literal + /entry +entrytypetimestamp with time zone/type/entry +entryGet last transaction log insert time stamp/entry + /row + row +entry literalfunctionpg_start_backup(parameterlabel/ typetext/ optional, parameterfast/ typeboolean/ /optional)/function/literal /entry entrytypetext/type/entry *** *** 14175,14180 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 14185,14197 /para para + functionpg_last_xact_insert_timestamp/ displays the time stamp of last inserted + transaction. This is the time at which the commit or abort WAL record for that transaction. + If there has been no transaction committed or aborted yet since the server has started, + this function returns NULL. +/para + +para For details about proper usage of these functions, see xref linkend=continuous-archiving. /para *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *** *** 867,872 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' --- 867,881 commandps/ command (see xref linkend=monitoring-ps for details). /para para + You can also calculate the lag in time stamp by comparing the last + WAL insert time stamp on the primary with the last WAL replay + time stamp on the standby. They can be retrieved using + functionpg_last_xact_insert_timestamp/ on the primary and + the functionpg_last_xact_replay_timestamp/ on the standby, + respectively (see xref linkend=functions-admin-backup-table and + xref linkend=functions-recovery-info-table for details). + /para + para You can retrieve a list of WAL sender processes via the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view. Large differences between *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata);