Re: [HACKERS] Streaming replication, some small issues
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas wrote: > - If a WAL file is not found in the master for some reason, standby goes > into an infinite loop retrying it: > > ERROR: could not read xlog records: FATAL: could not open file > "pg_xlog/0001" (log file 0, segment 0): No such file > or directory I also fixed this problem. git://git.postgresql.org/git/users/fujii/postgres.git branch: replication Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
On Wed, Dec 9, 2009 at 10:51 AM, Fujii Masao wrote: > On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane wrote: >> Fujii Masao writes: >>> Thought? Am I missing something? >> >> This seems terribly overdesigned. Just emit a warning when you see >> the "unlogged op" record and have done. > > Sounds quite simple. OK, I'll do so. Here is the patch: - Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only the reason that WAL archiving is not enabled and such record has not been written yet. - Cause archive recovery to end if an XLOG UNLOGGED record is found during it. I add this patch to the CommitFest 2010-01. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 1972,1977 heap_insert(Relation relation, HeapTuple tup, CommandId cid, --- 1972,1984 PageSetTLI(page, ThisTimeLineID); } + /* + * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason + * that WAL archiving is not enabled. + */ + if (options & HEAP_INSERT_SKIP_WAL && !relation->rd_istemp) + XLogSkipLogging(); + END_CRIT_SECTION(); UnlockReleaseBuffer(buffer); *** a/src/backend/access/nbtree/nbtsort.c --- b/src/backend/access/nbtree/nbtsort.c *** *** 215,220 _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2) --- 215,227 */ wstate.btws_use_wal = XLogArchivingActive() && !wstate.index->rd_istemp; + /* + * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason + * that WAL archiving is not enabled. + */ + if (!XLogArchivingActive() && !wstate.index->rd_istemp) + XLogSkipLogging(); + /* reserve the metapage */ wstate.btws_pages_alloced = BTREE_METAPAGE + 1; wstate.btws_pages_written = 0; *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 550,555 XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) --- 550,556 bool updrqst; bool doPageWrites; bool isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH); + bool isLogUnlogged = (rmid == RM_XLOG_ID && info == XLOG_UNLOGGED); /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) *** *** 699,707 begin:; * error checking in ReadRecord. This means that all callers of * XLogInsert must supply at least some not-in-a-buffer data. However, we * make an exception for XLOG SWITCH records because we don't want them to ! * ever cross a segment boundary. */ ! if (len == 0 && !isLogSwitch) elog(PANIC, "invalid xlog record length %u", len); START_CRIT_SECTION(); --- 700,708 * error checking in ReadRecord. This means that all callers of * XLogInsert must supply at least some not-in-a-buffer data. However, we * make an exception for XLOG SWITCH records because we don't want them to ! * ever cross a segment boundary. Also XLOG UNLOGGED records are exception. */ ! if (len == 0 && !isLogSwitch && !isLogUnlogged) elog(PANIC, "invalid xlog record length %u", len); START_CRIT_SECTION(); *** *** 3551,3558 ReadRecord(XLogRecPtr *RecPtr, int emode) got_record:; /* ! * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is ! * required. */ if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) { --- 3552,3559 got_record:; /* ! * xl_len == 0 is bad data for everything except XLOG SWITCH and XLOG UNLOGGED, ! * where it is required. */ if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) { *** *** 3564,3569 got_record:; --- 3565,3580 goto next_record_is_invalid; } } + else if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED) + { + if (record->xl_len != 0) + { + ereport(emode, + (errmsg("invalid xlog unlogged operation record at %X/%X", + RecPtr->xlogid, RecPtr->xrecoff))); + goto next_record_is_invalid; + } + } else if (record->xl_len == 0) { ereport(emode, *** *** 3759,3764 got_record:; --- 3770,3788 */ readOff = XLogSegSize - XLOG_BLCKSZ; } + + /* + * Special processing if it's an XLOG UNLOGGED record and we are doing + * an archive recovery. + */ + if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED && + InArchiveRecovery) + { + ereport(emode, + (errmsg("unlogged operation record is found at %X/%X", + RecPtr->xlogid, RecPtr->xrecoff))); + goto next_record_is_invalid; + } return (XLogRecord *) buffer; next_record_is_invalid:; *** *** 6998,7003 RequestXLogSwitch(void) --- 7022,7057 } /* + * Write an XLOG UNLOGGED record. + */ + void + XLogSkipLogging(void) + { + XLogRecData rdata; + static bool skipped = false; + + /* + * If an XLOG UNLO
Re: [HACKERS] Streaming replication, some small issues
On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane wrote: > Fujii Masao writes: >> Thought? Am I missing something? > > This seems terribly overdesigned. Just emit a warning when you see > the "unlogged op" record and have done. Sounds quite simple. OK, I'll do so. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
Fujii Masao writes: > Thought? Am I missing something? This seems terribly overdesigned. Just emit a warning when you see the "unlogged op" record and have done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
On Tue, Dec 8, 2009 at 9:05 PM, Heikki Linnakangas wrote: >> I suspect we should have a WAL record to say "unlogged operation >> performed here" which a standby database would recognize and throw a >> large warning up. > > +1. Seems like a very simple solution. Sounds good. This is not just a problem of SR, so I'll implement it as self-contained feature later. Design: - If relation is not temp and archiving (and streaming replication) is enabled, we log the "unlogged OP" record including relfilenode of the relation. - If "unlogged OP" record is found during archive recovery, we register its relfilenode to the hashtable which tracks maybe corrupted relations. If the registered relfilenode is brandnew, we emit warning. Also, the log record indicating "DROP TABLE" etc is found, we remove its relfilenode from the hashtable. - When restartpoint occurs, we write all the registered relfilenodes to the flat file. - At the end of archive recovery, if there is relfilenode in the hashtable, we emit FATAL error to prevent the server from being brought up. XXX: But this might be too conservative. I believe that some people want to complete archive recovery even if a relation is corrupted, and drop that relation after the server has been activated. So I'm going to provide new recovery.conf parameter specifying whether to let archive recovery fail when some relations might be corrupted. Thought? Am I missing something? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
Greg Stark wrote: > On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas > wrote: >> - It's possible to shut down master, change max_wal_senders to 0, >> restart and do an operation like CLUSTER which then skips WAL-logging. >> Then shutdown, change max_wal_senders back to non-zero. All this while >> the standby is running. Leads to a corrupt standby. > > The same thing is possible with archived logs as well, no? Yeah, I think you're right. > I suspect we should have a WAL record to say "unlogged operation > performed here" which a standby database would recognize and throw a > large warning up. +1. Seems like a very simple solution. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas wrote: > - It's possible to shut down master, change max_wal_senders to 0, > restart and do an operation like CLUSTER which then skips WAL-logging. > Then shutdown, change max_wal_senders back to non-zero. All this while > the standby is running. Leads to a corrupt standby. The same thing is possible with archived logs as well, no? I suspect we should have a WAL record to say "unlogged operation performed here" which a standby database would recognize and throw a large warning up. The only reason I say warning is because it might be reasonable if the relation is some temporary ETL table which isn't needed in the standby. Perhaps if we note the relation affected we could throw an error iff the standby is activated with the relation still existing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication, some small issues
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas wrote: > A couple of small issues spotted while reviewing the streaming > replication patch: Thanks for the review! > - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will > return zero before a just-launched WAL sender has sent its first > message. That can lead to WAL files that are still needed by another > standby to be deleted prematurely. Oops! I fixed that (in my git repository, see the bottom of this mail). > - If a WAL file is not found in the master for some reason, standby goes > into an infinite loop retrying it: > > ERROR: could not read xlog records: FATAL: could not open file > "pg_xlog/0001" (log file 0, segment 0): No such file > or directory http://archives.postgresql.org/pgsql-hackers/2009-09/msg01393.php >> walreceiver shouldn't die on connection error, just to be restarted by >> startup process. Can we add error handling a la bgwriter and have a >> retry loop within walreceiver? As the result of your current and previous comment, you mean that walreceiver should always retry connecting to the primary after a connection error occurs in PQgetXLogData/PQputXLogRecPtr, and exit after the other errors occur? Though I'm not sure whether we can determine the error type precisely. > - It's possible to shut down master, change max_wal_senders to 0, > restart and do an operation like CLUSTER which then skips WAL-logging. > Then shutdown, change max_wal_senders back to non-zero. All this while > the standby is running. Leads to a corrupt standby. I've regarded this case as a restriction. But, how do you think we should cope with it? 1. Restriction: only documentation is required? 2. Needs safe guard: - forbid the primary to perform such operations while the standby is running? - emit PANIC error on the standby if the primary which lost sync restarts? 3. Full solution: automatic resync mechanism is required? > I've also pushed a couple of small cosmetic changes to replication > branch at git://git.postgresql.org/git/users/heikki/postgres.git Your changes seem good. I pulled and merged your changes into my repository: git://git.postgresql.org/git/users/fujii/postgres.git branch: replication And, I pushed the capability of replication of a backup history file into the repository. > I'll continue reviewing... Thanks a lot! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming replication, some small issues
A couple of small issues spotted while reviewing the streaming replication patch: - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will return zero before a just-launched WAL sender has sent its first message. That can lead to WAL files that are still needed by another standby to be deleted prematurely. - If a WAL file is not found in the master for some reason, standby goes into an infinite loop retrying it: ERROR: could not read xlog records: FATAL: could not open file "pg_xlog/0001" (log file 0, segment 0): No such file or directory ERROR: could not read xlog records: FATAL: could not open file "pg_xlog/0001" (log file 0, segment 0): No such file or directory ERROR: could not read xlog records: FATAL: could not open file "pg_xlog/0001" (log file 0, segment 0): No such file or directory ... - It's possible to shut down master, change max_wal_senders to 0, restart and do an operation like CLUSTER which then skips WAL-logging. Then shutdown, change max_wal_senders back to non-zero. All this while the standby is running. Leads to a corrupt standby. I've also pushed a couple of small cosmetic changes to replication branch at git://git.postgresql.org/git/users/heikki/postgres.git I'll continue reviewing... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers