Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-17 Thread Jeff Janes
On Sun, Dec 16, 2012 at 8:38 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 8.12.2012 03:08, Jeff Janes wrote:

 It seems to me you need considerable expertise to figure out how to do
 optimal recovery (i.e. losing the least transactions) in this
 situation, and that that expertise cannot be automated.  Do you trust
 a partial file from a good hard drive, or a complete file from a
 partially melted pg_xlog?

 It clearly is a rather complex issue, no doubt about that. And yes,
 reliability of the devices with pg_xlog on them is an important detail.
 Alghough if the WAL is not written in a reliable way, you're hosed
 anyway I guess.

 The recommended archive command is based on the assumption that the
 local pg_xlog is intact (e.g. because it's located on a reliable RAID1
 array), which seems to be the assumption of the OP too.

 In my opinion it's more likely to meet an incomplete copy of WAL in the
 archive than a corrupted local WAL. And if it really is corrupted, it
 would be identified during replay.

Wouldn't the way it would be identified be for it to fail a checksum,
assume it was garbage left over from the previous WAL file which was
the process of being overwritten at the time of crash, and so
terminate recovery and open the database?

Assuming your goal is to recover all the transactions you possibly can
(rather than restore to a known point), I think you would want to try
recovery both ways and keep whichever one got the furthest.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-16 Thread Tomas Vondra
On 8.12.2012 03:08, Jeff Janes wrote:
 On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Hi,

 On 6.12.2012 23:45, MauMau wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 Well, that's unfortunate, but it's not clear that automatic recovery is
 possible.  The only way out of it would be if an undamaged copy of the
 segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
 not even be trying to fetch from the archive if we had a local copy.

 No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
 from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
 the following description:

 http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL


 WAL segments that cannot be found in the archive will be sought in
 pg_xlog/; this allows use of recent un-archived segments. However,
 segments that are available from the archive will be used in preference
 to files in pg_xlog/.

 So why don't you use an archive command that does not create such
 incomplete files? I mean something like this:

 archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

 Until the file is renamed, it's considered 'incomplete'.
 
 Wouldn't having the incomplete file be preferable over having none of it at 
 all?
 
 It seems to me you need considerable expertise to figure out how to do
 optimal recovery (i.e. losing the least transactions) in this
 situation, and that that expertise cannot be automated.  Do you trust
 a partial file from a good hard drive, or a complete file from a
 partially melted pg_xlog?

It clearly is a rather complex issue, no doubt about that. And yes,
reliability of the devices with pg_xlog on them is an important detail.
Alghough if the WAL is not written in a reliable way, you're hosed
anyway I guess.

The recommended archive command is based on the assumption that the
local pg_xlog is intact (e.g. because it's located on a reliable RAID1
array), which seems to be the assumption of the OP too.

In my opinion it's more likely to meet an incomplete copy of WAL in the
archive than a corrupted local WAL. And if it really is corrupted, it
would be identified during replay.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-07 Thread MauMau

Tomas wrote:

So why don't you use an archive command that does not create such
incomplete files? I mean something like this:

archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

Until the file is renamed, it's considered 'incomplete'.



That is a good idea.  This would surely solve my problem.  Don't you think 
this must be described in PG manual?


Now there are two solutions given.  I think it would be better to implement 
both of these.


1. Tomas's idea
Instruct the method like below in the PG manual.

archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

[Merits]
The incomplete file will not eventually remain.  The successful archival 
erases the incomplete file created during the previous interrupted archival.


[Demerits]
The administrator or developer of existing systems or products using 
PostgreSQL will not be likely to notice this change in the PG manual.  So 
they might face the media recovery failure.



2. My idea
PG continues media recovery when it encounters a partially filled WAL 
segment file.  It ereport(LOG)'s the unexpected size, and try to fetch the 
WAL file from pg_xlog/.  This is logically natural.  If the archived WAL 
file is smaller than the expected size, it (almost always) means that 
archive processing was interrupted and the unarchived WAL file is in 
pg_xlog/.  So PG should instead the same WAL file from pg_xlog/ and continue 
recovery.  PG will then try to fetch the subsequent WAL files from archive 
again, but will hardly find those files there and fetch them from pg_xlog/. 
This behavior complies with the description in the PG manual.


I mean just changing FATAL to LOG in the code in RestoreArchivedFile():

--
   if (StandbyMode  stat_buf.st_size  expectedSize)
elevel = DEBUG1;
   else
elevel = FATAL;
   ereport(elevel,
 (errmsg(archive file \%s\ has wrong size: %lu instead of %lu,
   xlogfname,
   (unsigned long) stat_buf.st_size,
   (unsigned long) expectedSize)));
   return false;
--

[Merits]
The administrator or developer of existing systems or products using 
PostgreSQL will benefit from this fix.


[Demerits]
The incomplete archive file(s) might be left forever unless the DBA delete 
them.  That occurs if pg_xlog/ is lost and the incomplete archive files will 
not be overwritten.



Could you give me opinions what to do?  I'm willing to submit these fixes.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-07 Thread Jeff Janes
On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Hi,

 On 6.12.2012 23:45, MauMau wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 Well, that's unfortunate, but it's not clear that automatic recovery is
 possible.  The only way out of it would be if an undamaged copy of the
 segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
 not even be trying to fetch from the archive if we had a local copy.

 No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
 from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
 the following description:

 http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL


 WAL segments that cannot be found in the archive will be sought in
 pg_xlog/; this allows use of recent un-archived segments. However,
 segments that are available from the archive will be used in preference
 to files in pg_xlog/.

 So why don't you use an archive command that does not create such
 incomplete files? I mean something like this:

 archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

 Until the file is renamed, it's considered 'incomplete'.

Wouldn't having the incomplete file be preferable over having none of it at all?

It seems to me you need considerable expertise to figure out how to do
optimal recovery (i.e. losing the least transactions) in this
situation, and that that expertise cannot be automated.  Do you trust
a partial file from a good hard drive, or a complete file from a
partially melted pg_xlog?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread MauMau

Hello,

Although this may have to be posted to pgsql-bugs or pgsql-general, let me 
ask you here because the problem probably needs PostgreSQL's code fix.



[Problem]
I'm using PostgreSQL 9.1.6 on Linux.  I encountered a serious problem that 
media recovery failed showing the following message:


FATAL:  archive file 000100800028 has wrong size: 7340032 
instead of 16777216


I'm using normal cp command to archive WAL files.  That is:

   archive_command = '/path/to/my_script.sh %p /backup/archive_log/%f'

my_script.sh
--
#!/bin/sh
some processing...
cp $1 $2
other processing...
--


The media recovery was triggered by power failure.  The disk drive that 
stored $PGDATA failed after a power failure.  So I replaced the failed disk, 
and performed media recovery by creating recovery.conf and running pg_ctl 
start.  However, pg_ctl failed with the above error message.




[Cause]
The cause is clear from the message.  PostgreSQL refuses to continue media 
recovery when it finds an archived WAL file whose size is not 16 MB.  The 
relevant code is in src/backend/access/transam/xlog.c:


--
  if (expectedSize  0  stat_buf.st_size != expectedSize)
  {
   int   elevel;

   /*
* If we find a partial file in standby mode, we assume it's
* because it's just being copied to the archive, and keep
* trying.
*
* Otherwise treat a wrong-sized file as FATAL to ensure the
* DBA would notice it, but is that too strong? We could try
* to plow ahead with a local copy of the file ... but the
* problem is that there probably isn't one, and we'd
* incorrectly conclude we've reached the end of WAL and we're
* done recovering ...
*/
   if (StandbyMode  stat_buf.st_size  expectedSize)
elevel = DEBUG1;
   else
elevel = FATAL;
   ereport(elevel,
 (errmsg(archive file \%s\ has wrong size: %lu instead of %lu,
   xlogfname,
   (unsigned long) stat_buf.st_size,
   (unsigned long) expectedSize)));
   return false;
  }
--


[How to fix]
Archived files can become smaller than their expected sizes for some 
reasons:


1. The power fails while archive_command is copying files (as in my case).
2. Immediate shutdown (pg_ctl stop -mi) is performed while archive_command 
is copying files.  In this case, cp or equivalent copying command is 
cancelled by SIGQUIT sent by postmaster.


Therefore, I think postgres must continue recovery by fetching files from 
pg_xlog/ when it encounters a partially filled archive files.  In addition, 
it may be necessary to remove the partially filled archived files, because 
they might prevent media recovery in the future (is this true?).  I mean we 
need the following fix.  What do you think?


--
  if (expectedSize  0  stat_buf.st_size != expectedSize)
  {
   int   elevel;
...
   if (StandbyMode  stat_buf.st_size  expectedSize)
elevel = DEBUG1;
   else
   {
elevel = LOG;
unlink(xlogpath);
   }
   ereport(elevel,
 (errmsg(archive file \%s\ has wrong size: %lu instead of %lu,
   xlogfname,
   (unsigned long) stat_buf.st_size,
   (unsigned long) expectedSize)));
   return false;
  }
--


I've heard that the next minor release is scheduled during this weekend.  I 
really wish this problem will be fixed in that release.  If you wish, I'll 
post the patch tomorrow or the next day.  Could you include the fix in the 
weekend release?



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Kevin Grittner
MauMau wrote:

 [Problem]
 I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious
 problem that media recovery failed showing the following message:
 
 FATAL: archive file 000100800028 has wrong size:
 7340032 instead of 16777216
 
 I'm using normal cp command to archive WAL files. That is:
 
  archive_command = '/path/to/my_script.sh %p
/backup/archive_log/%f'
 
 my_script.sh
 --
 #!/bin/sh
 some processing...
 cp $1 $2
 other processing...
 --
 
 
 The media recovery was triggered by power failure. The disk drive
 that stored $PGDATA failed after a power failure. So I replaced
 the failed disk, and performed media recovery by creating
 recovery.conf and running pg_ctl start. However, pg_ctl failed
 with the above error message.

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 I'm using PostgreSQL 9.1.6 on Linux.  I encountered a serious problem that 
 media recovery failed showing the following message:
 FATAL:  archive file 000100800028 has wrong size: 7340032 
 instead of 16777216

Well, that's unfortunate, but it's not clear that automatic recovery is
possible.  The only way out of it would be if an undamaged copy of the
segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
not even be trying to fetch from the archive if we had a local copy.

 Therefore, I think postgres must continue recovery by fetching files from 
 pg_xlog/ when it encounters a partially filled archive files.  In addition, 
 it may be necessary to remove the partially filled archived files, because 
 they might prevent media recovery in the future (is this true?).  I mean we 
 need the following fix.  What do you think?

I think having PG automatically destroy archive files is bordering on
insane.  It might be reasonable for the archiving process to do
something like this, if it has a full-size copy of the file available
to replace the damaged copy with.  But otherwise you're just throwing
away what's probably the only copy of useful data.

 I've heard that the next minor release is scheduled during this weekend.  I 
 really wish this problem will be fixed in that release.  If you wish, I'll 
 post the patch tomorrow or the next day.  Could you include the fix in the 
 weekend release?

Even if this were a good and uncontroversial idea, I'm afraid you're
several days too late.  The release is today, not next week.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread MauMau

From: Kevin Grittner kgri...@mail.com

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.


I'm afraid This is unacceptably difficult, or almost impossible, for many PG 
users.  How do you do the following?


1. Identify the file type (WAL segment, backup history file, timeline 
history file) and its expected size in the archive_command script. 
archive_command has to handle these three types of files.  Embedding file 
name logic (e.g. WAL is 000100020003) in archive_command is a 
bad idea, because the file name might change in the future PG release.


2. Append NUL bytes to the file in the archive_command shell script or batch 
file.  Particularly I have no idea about Windows.  I have some PG systems 
running on Windows.  This would compromise the ease of use of PostgreSQL.


So I believe PG should handle the problem, not the archive_command.

Regards
MauMau






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

Well, that's unfortunate, but it's not clear that automatic recovery is
possible.  The only way out of it would be if an undamaged copy of the
segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
not even be trying to fetch from the archive if we had a local copy.


No, PG will try to fetch the WAL file from pg_xlog when it cannot get it 
from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains the 
following description:


http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL

WAL segments that cannot be found in the archive will be sought in pg_xlog/; 
this allows use of recent un-archived segments. However, segments that are 
available from the archive will be used in preference to files in pg_xlog/.


So, continuing recovery by changing the emode to LOG would work.  What do 
you think about this fix?




I think having PG automatically destroy archive files is bordering on
insane.



I agree.  Before that, PG cannot know the archive location, so PG cannot 
delete the partially filled archive files.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Tomas Vondra
Hi,

On 6.12.2012 23:45, MauMau wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 Well, that's unfortunate, but it's not clear that automatic recovery is
 possible.  The only way out of it would be if an undamaged copy of the
 segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
 not even be trying to fetch from the archive if we had a local copy.
 
 No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
 from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
 the following description:
 
 http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
 
 
 WAL segments that cannot be found in the archive will be sought in
 pg_xlog/; this allows use of recent un-archived segments. However,
 segments that are available from the archive will be used in preference
 to files in pg_xlog/.

So why don't you use an archive command that does not create such
incomplete files? I mean something like this:

archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

Until the file is renamed, it's considered 'incomplete'.


Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Daniel Farina
On Thu, Dec 6, 2012 at 9:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 MauMau maumau...@gmail.com writes:
 I'm using PostgreSQL 9.1.6 on Linux.  I encountered a serious problem that
 media recovery failed showing the following message:
 FATAL:  archive file 000100800028 has wrong size: 7340032
 instead of 16777216

 Well, that's unfortunate, but it's not clear that automatic recovery is
 possible.  The only way out of it would be if an undamaged copy of the
 segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
 not even be trying to fetch from the archive if we had a local copy.

I'm inclined to agree with this: I've had a case much like the
original poster (too-short WAL segments because of media issues),
except in my case the archiver had archived a bogus copy of the data
also (short length and all), so our attempt to recover from archives
on a brand new system was met with obscure failure[0].  And, rather
interestingly, the WAL disk was able to write bogusly without error
for many minutes, which made for a fairly exotic recovery based on
pg_resetxlog: I grabbed quite a few minutes of WAL of various sub-16MB
sizes to spot check the situation.

It never occurred to me there was a way to really fix this that still
involves the archiver reading from a file system: what can one do when
one no longer trusts read() to get what was write()d?

[0]:  Postgres wasn't very good about reporting the failure: in the
case bogus files have been restored from archives, it seems to just
bounce through timelines it knows about searching for a WAL it likes,
without any real error messaging like got corrupt wal from archive.
That could probably be fixed.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers