On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fu...@gmail.com> wrote:
> On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>>
>>
>> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paqu...@gmail.com>
>> wrote:
>>>
>>> The additional process at promotion sounds like a good idea, I'll try to
>>> get a patch done tomorrow. This would result as well in removing the
>>> XLogArchiveForceDone stuff. Either way, not that I have been able to
>>> reproduce the problem manually, things can be clearly solved.
>>
>> Please find attached two patches aimed to fix this issue and to improve the
>> situation:
>> - 0001 prevents the apparition of those phantom WAL segment file by ensuring
>> that when a node is in recovery it will remove it whatever its status in
>> archive_status. This patch is the real fix, and should be applied down to
>> 9.2.
>> - 0002 is a patch implementing Heikki's idea of enforcing all the segment
>> files present in pg_xlog to have their status to .done, marking them for
>> removal. When looking at the code, I finally concluded that Fujii-san's
>> point, about marking in all cases as .done segment files that have been
>> fully streamed, actually makes more sense to not be backward. This patch
>> would actually not be mandatory for back-patching, but it makes the process
>> more robust IMO.
>
> Thanks for the patches!

While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again.... Attached patch fixes this bug.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 5351,5368 **** exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo)
  	 * for the new timeline.
  	 *
  	 * Notify the archiver that the last WAL segment of the old timeline is
! 	 * ready to copy to archival storage. Otherwise, it is not archived for a
! 	 * while.
  	 */
  	if (endTLI != ThisTimeLineID)
  	{
  		XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
  
! 		if (XLogArchivingActive())
! 		{
! 			XLogFileName(xlogpath, endTLI, endLogSegNo);
! 			XLogArchiveNotify(xlogpath);
! 		}
  	}
  
  	/*
--- 5351,5367 ----
  	 * for the new timeline.
  	 *
  	 * Notify the archiver that the last WAL segment of the old timeline is
! 	 * ready to copy to archival storage if its .done file doesn't exist
! 	 * (e.g., if it's the restored WAL file, it's expected to have .done file).
! 	 * Otherwise, it is not archived for a while.
  	 */
  	if (endTLI != ThisTimeLineID)
  	{
  		XLogFileCopy(endLogSegNo, endTLI, endLogSegNo);
  
! 		/* Create .ready file only when neither .ready nor .done files exist */
! 		XLogFileName(xlogpath, endTLI, endLogSegNo);
! 		XLogArchiveCheckDone(xlogpath);
  	}
  
  	/*
*** a/src/backend/access/transam/xlogarchive.c
--- b/src/backend/access/transam/xlogarchive.c
***************
*** 516,521 **** XLogArchiveNotify(const char *xlog)
--- 516,527 ----
  	char		archiveStatusPath[MAXPGPATH];
  	FILE	   *fd;
  
+ 	/*
+ 	 * We should not create .ready file for already archived WAL file to
+ 	 * prevent it from being archived again.
+ 	 */
+ 	Assert(XLogArchiveIsBusy(xlog));
+ 
  	/* insert an otherwise empty file called <XLOG>.ready */
  	StatusFilePath(archiveStatusPath, xlog, ".ready");
  	fd = AllocateFile(archiveStatusPath, "w");
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to