Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa
(2013/03/06 16:50), Heikki Linnakangas wrote: Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Heikki Linnakangas
On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote: Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa
(2013/03/07 19:41), Heikki Linnakangas wrote: On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote: Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error,

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Kyotaro HORIGUCHI
Everything seems settled up above my head while sleeping.. Sorry for crumsy test script, and thank you for refining it, Mitsumasa. And thank you for fixing the bug and the detailed explanation, Heikki. I confirmed that the problem is fixed also for me at origin/REL9_2_STABLE. I understand

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-06 Thread Heikki Linnakangas
On 05.03.2013 14:09, KONDO Mitsumasa wrote: Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hello, I could cause the behavior and might understand the cause. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. In XLogPageRead, when RecPtr goes beyond the last page, the current xlog file is

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Sorry, I sent wrong script. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. regards, -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/sh pgpath=$HOME/bin/pgsql_924b echo $PATH |

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread KONDO Mitsumasa
Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. I also found a bug in latest

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hmm.. Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. It could be. Before that, my

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-05 Thread Kyotaro HORIGUCHI
Hi, I suppose the attached patch is close to the solution. I think that this is an original intention of Heikki's patch. I noticed that archive recovery will be turned on in next_record_is_invalid thanks to your patch. On the other hand, your patch fixes that point but ReadRecord runs on the

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-04 Thread Kyotaro HORIGUCHI
This is an interim report for this patch. We found that PostgreSQL with this patch unexpctedly becomes primary when starting up as standby. We'll do further investigation for the behavior. Anyway, I've committed this to master and 9.2 now. This seems to fix the issue. We'll examine this

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-26 Thread Josh Berkus
Folks, Is there any way this particular issue could cause data corruption without causing a crash? I don't see a way for it to do so, but I wanted to verify. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
Hello, Anyway, I've committed this to master and 9.2 now. This seems to fix the issue. We'll examine this further. Thank you. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
At Fri, 22 Feb 2013 11:42:39 +0200, Heikki Linnakangas hlinnakan...@vmware.com wrote in 51273d8f.7060...@vmware.com On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote: In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL is stopped by 'pg_ctl stop -m i' regardless of situation. That

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-25 Thread Kyotaro HORIGUCHI
However this has become useless, I want to explain about how this works. I tried to postpone smgrtruncate TO the next checktpoint. Umm, why? I don't understand this patch at all. This inhibits truncate files after (quite vague in the patch:-) the previous checkpoint by hindering the deleted

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 22.02.2013 02:13, Michael Paquier wrote: On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 15.02.2013 15:49, Heikki Linnakangas wrote: Attached is a patch for git master. The basic idea is to split InArchiveRecovery into two variables,

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 14.02.2013 19:18, Fujii Masao wrote: Yes. And the resource agent for streaming replication in Pacemaker (it's the OSS clusterware) is the user of that archive recovery scenario, too. When it starts up the server, it always creates the recovery.conf and starts the server as the standby. It

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-22 Thread Heikki Linnakangas
On 15.02.2013 10:33, Kyotaro HORIGUCHI wrote: Sorry, I omitted to show how we found this issue. In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL is stopped by 'pg_ctl stop -m i' regardless of situation. That seems like a bad idea. If nothing else, crash recovery can take a

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Heikki Linnakangas
On 15.02.2013 15:49, Heikki Linnakangas wrote: Attached is a patch for git master. The basic idea is to split InArchiveRecovery into two variables, InArchiveRecovery and ArchiveRecoveryRequested. ArchiveRecoveryRequested is set when recovery.conf exists. But if we don't know how far we need to

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-21 Thread Michael Paquier
On Thu, Feb 21, 2013 at 11:09 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 15.02.2013 15:49, Heikki Linnakangas wrote: Attached is a patch for git master. The basic idea is to split InArchiveRecovery into two variables, InArchiveRecovery and ArchiveRecoveryRequested.

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Kyotaro HORIGUCHI
Sorry, Let me correct a bit. I tried to postpone smgrtruncate after the next checkpoint. This I tried to postpone smgrtruncate TO the next checktpoint. is similar to what hotstandby feedback does to vacuum. It seems to be working fine but I warry that it might also bloats the table. I

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-20 Thread Heikki Linnakangas
On 20.02.2013 10:01, Kyotaro HORIGUCHI wrote: Sorry, Let me correct a bit. I tried to postpone smgrtruncate after the next checkpoint. This I tried to postpone smgrtruncate TO the next checktpoint. Umm, why? I don't understand this patch at all. - Heikki -- Sent via pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Ants Aasma
On Mon, Feb 18, 2013 at 8:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: backupStartPoint is set, which signals recovery to wait for an end-of-backup record, until the system is considered consistent. If the backup is taken from a hot standby, backupEndPoint is set, instead of

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-19 Thread Kyotaro HORIGUCHI
Hello, I looked this from another point of view. I consider the current discussion to be based on how to predict the last consistency point. But there is another aspect of this issue. I tried to postpone smgrtruncate after the next checkpoint. This is similar to what hotstandby feedback does to

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-18 Thread Heikki Linnakangas
On 16.02.2013 10:40, Ants Aasma wrote: On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: While this solution would help solve my issue, it assumes that the correct amount of WAL files are actually there. Currently the docs for setting up a standby refer to

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-16 Thread Ants Aasma
On Fri, Feb 15, 2013 at 3:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: While this solution would help solve my issue, it assumes that the correct amount of WAL files are actually there. Currently the docs for setting up a standby refer to 24.3.4. Recovering Using a Continuous

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Kyotaro HORIGUCHI
Sorry, I omitted to show how we found this issue. In HA DB cluster cosists of Pacemaker and PostgreSQL, PostgreSQL is stopped by 'pg_ctl stop -m i' regardless of situation. On the other hand, PosrgreSQL RA(Rsource Agent) is obliged to start the master node via hot standby state because of the

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Ants Aasma
On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggs si...@2ndquadrant.com wrote: The problem is that we startup Hot Standby before we hit the min recovery point because that isn't recorded. For me, the thing to do is to make the min recovery point == end of WAL when state is DB_IN_PRODUCTION. That

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-15 Thread Heikki Linnakangas
On 15.02.2013 13:05, Ants Aasma wrote: On Wed, Feb 13, 2013 at 10:52 PM, Simon Riggssi...@2ndquadrant.com wrote: The problem is that we startup Hot Standby before we hit the min recovery point because that isn't recorded. For me, the thing to do is to make the min recovery point == end of WAL

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Ants Aasma
On Feb 13, 2013 10:29 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Hmm, I just realized a little problem with that approach. If you take a base backup using an atomic filesystem backup from a running server, and start archive recovery from that, that's essentially the same thing as

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 13.02.2013 17:02, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: At least in back-branches, I'd call this a pilot error. You can't turn a master into a standby just by creating a

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-14 Thread Fujii Masao
On Thu, Feb 14, 2013 at 5:52 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote: Without step 3, the server would perform crash recovery, and it would work. But because of the recovery.conf file, the server goes into archive

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 09:46, Kyotaro HORIGUCHI wrote: In this case, the FINAL consistency point is at the XLOG_SMGR_TRUNCATE record, but current implemet does not record the consistency point (checkpoint, or commit or smgr_truncate) itself, so we cannot predict the final consistency point on starting of

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: At least in back-branches, I'd call this a pilot error. You can't turn a master into a standby just by creating a recovery.conf file. At least not if the master was not shut down cleanly first. ... I'm not sure that's worth the trouble,

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be unacceptable from a performance point of view. Updating the control file that often

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be unacceptable from a performance point of

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 20:25, Simon Riggs wrote: On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be unacceptable from a performance point of view.

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: Well, no-one's complained about the performance. From a robustness point of view, it might be good to keep the minRecoveryPoint value in a separate file, for example, to avoid rewriting the control file that often. Then again, why fix it

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:21, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: Well, no-one's complained about the performance. From a robustness point of view, it might be good to keep the minRecoveryPoint value in a separate file, for example, to avoid rewriting the control file that

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:03, Tom Lane wrote: Simon Riggssi...@2ndquadrant.com writes: On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com wrote: To be precise, we'd need to update the control file on every XLogFlush(), like we do during archive recovery. That would indeed be

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 13.02.2013 21:21, Tom Lane wrote: It would only be broken if someone interrupted a crash recovery mid-flight and tried to establish a recovery stop point before the end of WAL, no? Why don't we just forbid that case? This would either be

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 21:30, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: On 13.02.2013 21:21, Tom Lane wrote: It would only be broken if someone interrupted a crash recovery mid-flight and tried to establish a recovery stop point before the end of WAL, no? Why don't we just

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: The problem we're trying to solve is determining how much WAL needs to be replayed until the database is consistent again. In crash recovery, the answer is all of it. That's why the CRC in the WAL is essential; it's required to determine

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 13.02.2013 21:30, Tom Lane wrote: Well, archive recovery is a different scenario --- Simon was questioning whether we need a minRecoveryPoint mechanism in crash recovery, or at least that's what I thought he asked. Ah, ok. The short

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Heikki Linnakangas
On 13.02.2013 17:02, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: At least in back-branches, I'd call this a pilot error. You can't turn a master into a standby just by creating a recovery.conf file. At least not if the master was not shut down cleanly first. ... I'm not

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-02-13 Thread Simon Riggs
On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote: Without step 3, the server would perform crash recovery, and it would work. But because of the recovery.conf file, the server goes into archive recovery, and because minRecoveryPoint is not set, it assumes that the

[HACKERS] 9.2.3 crashes during archive recovery

2013-02-12 Thread Kyotaro HORIGUCHI
Hello, 9.2.3 crashes during archive recovery. This was also corrected at some point on origin/master with another problem fixed by the commit below if my memory is correct. But current HEAD and 9.2.3 crashes during archive recovery (not on standby) by the 'marking deleted page visible' problem.