Re: [HACKERS] Streaming replication, some small issues

2009-12-11 Thread Fujii Masao
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
 wrote:
> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/0001" (log file 0, segment 0): No such file
> or directory

I also fixed this problem.

  git://git.postgresql.org/git/users/fujii/postgres.git
  branch: replication

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-09 Thread Fujii Masao
On Wed, Dec 9, 2009 at 10:51 AM, Fujii Masao  wrote:
> On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane  wrote:
>> Fujii Masao  writes:
>>> Thought? Am I missing something?
>>
>> This seems terribly overdesigned.  Just emit a warning when you see
>> the "unlogged op" record and have done.
>
> Sounds quite simple. OK, I'll do so.

Here is the patch:

- Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
  the reason that WAL archiving is not enabled and such record has not been
  written yet.

- Cause archive recovery to end if an XLOG UNLOGGED record is found during
  it.


I add this patch to the CommitFest 2010-01.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 1972,1977  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1972,1984 
  		PageSetTLI(page, ThisTimeLineID);
  	}
  
+ 	/*
+ 	 * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ 	 * that WAL archiving is not enabled.
+ 	 */
+ 	if (options & HEAP_INSERT_SKIP_WAL && !relation->rd_istemp)
+ 		XLogSkipLogging();
+ 
  	END_CRIT_SECTION();
  
  	UnlockReleaseBuffer(buffer);
*** a/src/backend/access/nbtree/nbtsort.c
--- b/src/backend/access/nbtree/nbtsort.c
***
*** 215,220  _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
--- 215,227 
  	 */
  	wstate.btws_use_wal = XLogArchivingActive() && !wstate.index->rd_istemp;
  
+ 	/*
+ 	 * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ 	 * that WAL archiving is not enabled.
+ 	 */
+ 	if (!XLogArchivingActive() && !wstate.index->rd_istemp)
+ 		XLogSkipLogging();
+ 
  	/* reserve the metapage */
  	wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
  	wstate.btws_pages_written = 0;
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 550,555  XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
--- 550,556 
  	bool		updrqst;
  	bool		doPageWrites;
  	bool		isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);
+ 	bool		isLogUnlogged = (rmid == RM_XLOG_ID && info == XLOG_UNLOGGED);
  
  	/* cross-check on whether we should be here or not */
  	if (!XLogInsertAllowed())
***
*** 699,707  begin:;
  	 * error checking in ReadRecord.  This means that all callers of
  	 * XLogInsert must supply at least some not-in-a-buffer data.  However, we
  	 * make an exception for XLOG SWITCH records because we don't want them to
! 	 * ever cross a segment boundary.
  	 */
! 	if (len == 0 && !isLogSwitch)
  		elog(PANIC, "invalid xlog record length %u", len);
  
  	START_CRIT_SECTION();
--- 700,708 
  	 * error checking in ReadRecord.  This means that all callers of
  	 * XLogInsert must supply at least some not-in-a-buffer data.  However, we
  	 * make an exception for XLOG SWITCH records because we don't want them to
! 	 * ever cross a segment boundary. Also XLOG UNLOGGED records are exception.
  	 */
! 	if (len == 0 && !isLogSwitch && !isLogUnlogged)
  		elog(PANIC, "invalid xlog record length %u", len);
  
  	START_CRIT_SECTION();
***
*** 3551,3558  ReadRecord(XLogRecPtr *RecPtr, int emode)
  got_record:;
  
  	/*
! 	 * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is
! 	 * required.
  	 */
  	if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
  	{
--- 3552,3559 
  got_record:;
  
  	/*
! 	 * xl_len == 0 is bad data for everything except XLOG SWITCH and XLOG UNLOGGED,
! 	 * where it is required.
  	 */
  	if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
  	{
***
*** 3564,3569  got_record:;
--- 3565,3580 
  			goto next_record_is_invalid;
  		}
  	}
+ 	else if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED)
+ 	{
+ 		if (record->xl_len != 0)
+ 		{
+ 			ereport(emode,
+ 	(errmsg("invalid xlog unlogged operation record at %X/%X",
+ 			RecPtr->xlogid, RecPtr->xrecoff)));
+ 			goto next_record_is_invalid;
+ 		}
+ 	}
  	else if (record->xl_len == 0)
  	{
  		ereport(emode,
***
*** 3759,3764  got_record:;
--- 3770,3788 
  		 */
  		readOff = XLogSegSize - XLOG_BLCKSZ;
  	}
+ 
+ 	/*
+ 	 * Special processing if it's an XLOG UNLOGGED record and we are doing
+ 	 * an archive recovery.
+ 	 */
+ 	if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED &&
+ 		InArchiveRecovery)
+ 	{
+ 		ereport(emode,
+ (errmsg("unlogged operation record is found at %X/%X",
+ 		RecPtr->xlogid, RecPtr->xrecoff)));
+ 		goto next_record_is_invalid;
+ 	}
  	return (XLogRecord *) buffer;
  
  next_record_is_invalid:;
***
*** 6998,7003  RequestXLogSwitch(void)
--- 7022,7057 
  }
  
  /*
+  * Write an XLOG UNLOGGED record.
+  */
+ void
+ XLogSkipLogging(void)
+ {
+ 	XLogRecData rdata;
+ 	static bool skipped = false;
+ 
+ 	/*
+ 	 * If an XLOG UNLO

Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Fujii Masao
On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane  wrote:
> Fujii Masao  writes:
>> Thought? Am I missing something?
>
> This seems terribly overdesigned.  Just emit a warning when you see
> the "unlogged op" record and have done.

Sounds quite simple. OK, I'll do so.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Tom Lane
Fujii Masao  writes:
> Thought? Am I missing something?

This seems terribly overdesigned.  Just emit a warning when you see
the "unlogged op" record and have done.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Fujii Masao
On Tue, Dec 8, 2009 at 9:05 PM, Heikki Linnakangas
 wrote:
>> I suspect we should have a WAL record to say "unlogged operation
>> performed here" which a standby database would recognize and throw a
>> large warning up.
>
> +1. Seems like a very simple solution.

Sounds good. This is not just a problem of SR, so I'll implement it
as self-contained feature later.

Design:
- If relation is not temp and archiving (and streaming replication) is enabled,
  we log the "unlogged OP" record including relfilenode of the relation.

- If "unlogged OP" record is found during archive recovery, we register its
  relfilenode to the hashtable which tracks maybe corrupted relations.
  If the registered relfilenode is brandnew, we emit warning. Also, the log
  record indicating "DROP TABLE" etc is found, we remove its relfilenode
  from the hashtable.

- When restartpoint occurs, we write all the registered relfilenodes to the
  flat file.

- At the end of archive recovery, if there is relfilenode in the hashtable, we
  emit FATAL error to prevent the server from being brought up.
  XXX: But this might be too conservative. I believe that some people want
  to complete archive recovery even if a relation is corrupted, and drop that
  relation after the server has been activated. So I'm going to provide new
  recovery.conf parameter specifying whether to let archive recovery fail
  when some relations might be corrupted.

Thought? Am I missing something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Heikki Linnakangas
Greg Stark wrote:
> On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas
>  wrote:
>> - It's possible to shut down master, change max_wal_senders to 0,
>> restart and do an operation like CLUSTER which then skips WAL-logging.
>> Then shutdown, change max_wal_senders back to non-zero. All this while
>> the standby is running. Leads to a corrupt standby.
> 
> The same thing is possible with archived logs as well, no?

Yeah, I think you're right.

> I suspect we should have a WAL record to say "unlogged operation
> performed here" which a standby database would recognize and throw a
> large warning up.

+1. Seems like a very simple solution.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Greg Stark
On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas
 wrote:
> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.

The same thing is possible with archived logs as well, no?

I suspect we should have a WAL record to say "unlogged operation
performed here" which a standby database would recognize and throw a
large warning up. The only reason I say warning is because it might be
reasonable if the relation is some temporary ETL table which isn't
needed in the standby. Perhaps if we note the relation affected we
could throw an error iff the standby is activated with the relation
still existing.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication, some small issues

2009-12-08 Thread Fujii Masao
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
 wrote:
> A couple of small issues spotted while reviewing the streaming
> replication patch:

Thanks for the review!

> - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
> return zero before a just-launched WAL sender has sent its first
> message. That can lead to WAL files that are still needed by another
> standby to be deleted prematurely.

Oops! I fixed that (in my git repository, see the bottom of this mail).

> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/0001" (log file 0, segment 0): No such file
> or directory

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01393.php
>> walreceiver shouldn't die on connection error, just to be restarted by
>> startup process. Can we add error handling a la bgwriter and have a
>> retry loop within walreceiver?

As the result of your current and previous comment, you mean that
walreceiver should always retry connecting to the primary after
a connection error occurs in PQgetXLogData/PQputXLogRecPtr, and
exit after the other errors occur? Though I'm not sure whether
we can determine the error type precisely.

> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.

I've regarded this case as a restriction. But, how do you think
we should cope with it?

1. Restriction: only documentation is required?
2. Needs safe guard:
  - forbid the primary to perform such operations while the
standby is running?
  - emit PANIC error on the standby if the primary which lost sync
restarts?
3. Full solution: automatic resync mechanism is required?

> I've also pushed a couple of small cosmetic changes to replication
> branch at git://git.postgresql.org/git/users/heikki/postgres.git

Your changes seem good.

I pulled and merged your changes into my repository:

   git://git.postgresql.org/git/users/fujii/postgres.git
   branch: replication

And, I pushed the capability of replication of a backup history file
into the repository.

> I'll continue reviewing...

Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming replication, some small issues

2009-12-08 Thread Heikki Linnakangas
A couple of small issues spotted while reviewing the streaming
replication patch:

- Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
return zero before a just-launched WAL sender has sent its first
message. That can lead to WAL files that are still needed by another
standby to be deleted prematurely.

- If a WAL file is not found in the master for some reason, standby goes
into an infinite loop retrying it:

ERROR:  could not read xlog records: FATAL:  could not open file
"pg_xlog/0001" (log file 0, segment 0): No such file
or directory

ERROR:  could not read xlog records: FATAL:  could not open file
"pg_xlog/0001" (log file 0, segment 0): No such file
or directory

ERROR:  could not read xlog records: FATAL:  could not open file
"pg_xlog/0001" (log file 0, segment 0): No such file
or directory

...

- It's possible to shut down master, change max_wal_senders to 0,
restart and do an operation like CLUSTER which then skips WAL-logging.
Then shutdown, change max_wal_senders back to non-zero. All this while
the standby is running. Leads to a corrupt standby.


I've also pushed a couple of small cosmetic changes to replication
branch at git://git.postgresql.org/git/users/heikki/postgres.git

I'll continue reviewing...

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers