On Tue, 2010-07-27 at 15:23 -0700, Jeff Davis wrote:
> On Tue, 2010-07-27 at 17:18 -0400, Robert Haas wrote:
> > > My first concern with that idea was that it may create an inconsistency
> > > between the primary and the standby. The primary could have a bunch of
> > > zero pages that never make it to the standby.
> > 
> > Maybe I'm slow on the uptake here, but don't the pages start out
> > all-zeroes on the standby just as they do on the primary? The only way
> > it seems like this would be a problem is if a page that previously
> > contained data on the primary was subsequently zeroed without writing
> > a WAL record - or am I confused?
> 
> The case I was concerned about is when you have a table on the primary
> with a bunch of zero pages at the end. Then you SET TABLESPACE, and none
> of the copied pages (or even the fact that they exist) would be sent to
> the standby, but they would exist on the primary. And later pages may
> have data, so the standby may see page N but not N-1.
> 
> Generally, most of the code is not expecting to read or write past the
> end of the file, unless it's doing an extension.
> 
> However, I think everything is fine during recovery, because it looks
> like it's designed to create zero pages as needed. So your idea seems
> safe to me, although I do still have some doubts because of my lack of
> knowledge in this area; particularly hot standby conflict
> detection/resolution.
> 
> My idea was different: still log the zero page, just don't set LSN or
> TLI for a zero page in log_newpage() or heap_xlog_newpage(). This isn't
> as clean as your idea, but I'm a little more confident that it is
> correct.
> 

Both potential fixes attached and both appear to work.

fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
heap_xlog_newpage() if the page is not zeroed.

fix2 -- Don't call log_newpage() at all if the page is not zeroed.

Please review. I don't have a strong opinion about which one should be
applied.

Regards,
        Jeff Davis
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 4079,4086 **** log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
  
  	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
  
! 	PageSetLSN(page, recptr);
! 	PageSetTLI(page, ThisTimeLineID);
  
  	END_CRIT_SECTION();
  
--- 4079,4093 ----
  
  	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
  
! 	/*
! 	 * The new page may be uninitialized. If so, we can't set the LSN
! 	 * and TLI because that would corrupt the page.
! 	 */
! 	if (!PageIsNew(page))
! 	{
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 	}
  
  	END_CRIT_SECTION();
  
***************
*** 4266,4273 **** heap_xlog_newpage(XLogRecPtr lsn, XLogRecord *record)
  	Assert(record->xl_len == SizeOfHeapNewpage + BLCKSZ);
  	memcpy(page, (char *) xlrec + SizeOfHeapNewpage, BLCKSZ);
  
! 	PageSetLSN(page, lsn);
! 	PageSetTLI(page, ThisTimeLineID);
  	MarkBufferDirty(buffer);
  	UnlockReleaseBuffer(buffer);
  }
--- 4273,4288 ----
  	Assert(record->xl_len == SizeOfHeapNewpage + BLCKSZ);
  	memcpy(page, (char *) xlrec + SizeOfHeapNewpage, BLCKSZ);
  
! 	/*
! 	 * The new page may be uninitialized. If so, we can't set the LSN
! 	 * and TLI because that would corrupt the page.
! 	 */
! 	if (!PageIsNew(page))
! 	{
! 		PageSetLSN(page, lsn);
! 		PageSetTLI(page, ThisTimeLineID);
! 	}
! 
  	MarkBufferDirty(buffer);
  	UnlockReleaseBuffer(buffer);
  }
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 7082,7089 **** copy_relation_data(SMgrRelation src, SMgrRelation dst,
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		/* XLOG stuff */
! 		if (use_wal)
  			log_newpage(&dst->smgr_rnode, forkNum, blkno, page);
  
  		/*
--- 7082,7094 ----
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		/*
! 		 * XLOG stuff
! 		 *
! 		 * Don't log page if it's new, because that will set the LSN
! 		 * and TLI, corrupting the page.
! 		 */
! 		if (use_wal && !PageIsNew(page))
  			log_newpage(&dst->smgr_rnode, forkNum, blkno, page);
  
  		/*
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to