Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Heikki Linnakangas

On 05/09/2014 05:19 PM, Fujii Masao wrote:

On Thu, Mar 20, 2014 at 11:38 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Kyotaro HORIGUCHI escribió:

Hi, I confirmed that 82233ce7ea4 surely did it.

At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote

Fujii Masao escribió:

On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:



9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.


Hmm, I don't think we've changed that behavior in 9.4.


ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.


Uh, interesting.  I didn't see that secondary effect.  I hope it's not
for ill?


The crucial factor for the behavior change is that pmdie has
become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
pmdie() ended with ExitPostmaster(0) before the patch but now
it ends with 'PostmasterStateMachine(); break;' so continues to
run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
shutdown).

After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
by SIGCHLDs from non-significant processes, then CancelBackup().


Judging from what was being said on the thread, it seems that running
CancelBackup() after an immediate shutdown is better than not doing it,
correct?


This is listed as a 9.4 Open Item, but no one seems to want to revert
this change.
So I'll drop this from the Open Item list barring objections.


I object. We used to call CancelBackup() on immediate shutdown, which 
was good. That was inadvertently changed by commit 82233ce. That's a 
regression we should fix. I agree with Alvaro upthread that we don't 
want to revert 82233ce, but we should come up with a fix.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Fujii Masao
On Mon, May 12, 2014 at 4:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/09/2014 05:19 PM, Fujii Masao wrote:

 On Thu, Mar 20, 2014 at 11:38 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

 Kyotaro HORIGUCHI escribió:

 Hi, I confirmed that 82233ce7ea4 surely did it.

 At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote

 Fujii Masao escribió:

 On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:


 9.4 canceles backup mode even on immediate shutdown so the
 operation causes no problem, but 9.3 and before are doesn't.


 Hmm, I don't think we've changed that behavior in 9.4.


 ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
 shutdown that way.


 Uh, interesting.  I didn't see that secondary effect.  I hope it's not
 for ill?


 The crucial factor for the behavior change is that pmdie has
 become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
 pmdie() ended with ExitPostmaster(0) before the patch but now
 it ends with 'PostmasterStateMachine(); break;' so continues to
 run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
 shutdown).

 After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
 by SIGCHLDs from non-significant processes, then CancelBackup().


 Judging from what was being said on the thread, it seems that running
 CancelBackup() after an immediate shutdown is better than not doing it,
 correct?


 This is listed as a 9.4 Open Item, but no one seems to want to revert
 this change.
 So I'll drop this from the Open Item list barring objections.


 I object. We used to call CancelBackup() on immediate shutdown, which was
 good. That was inadvertently changed by commit 82233ce. That's a regression
 we should fix. I agree with Alvaro upthread that we don't want to revert
 82233ce, but we should come up with a fix.

Hmm.. probably I have the same opinion with you. If I understand this correctly,
an immediate shutdown doesn't call CancelBackup() in 9.4 or before. But the
commit 82233ce unintentionally changed an immediate shutdown so that it calls
CancelBackup(). For now, no one wants to revert the current behavior. So I think
there is nothing that we have to do now. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Heikki Linnakangas

On 05/12/2014 02:29 PM, Fujii Masao wrote:

Hmm.. probably I have the same opinion with you. If I understand this correctly,
an immediate shutdown doesn't call CancelBackup() in 9.4 or before. But the
commit 82233ce unintentionally changed an immediate shutdown so that it calls
CancelBackup().


Oh, sorry. I thought it was the other way 'round: that we used to remove 
backup_label on an immediate shutdown on 9.3 and before, but that 9.4 
doesn't do that anymore. Now that I re-read this thread and tested it 
myself, I see that I got it backwards.


I agree the new behavior is better, and we should just remove the Open 
Items entry.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Fujii Masao
On Mon, May 12, 2014 at 8:40 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/12/2014 02:29 PM, Fujii Masao wrote:

 Hmm.. probably I have the same opinion with you. If I understand this
 correctly,
 an immediate shutdown doesn't call CancelBackup() in 9.4 or before. But
 the
 commit 82233ce unintentionally changed an immediate shutdown so that it
 calls
 CancelBackup().


 Oh, sorry. I thought it was the other way 'round: that we used to remove
 backup_label on an immediate shutdown on 9.3 and before, but that 9.4
 doesn't do that anymore. Now that I re-read this thread and tested it
 myself, I see that I got it backwards.

 I agree the new behavior is better, and we should just remove the Open Items
 entry.

Yes. I just removed that entry.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Mon, May 12, 2014 at 8:40 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I agree the new behavior is better, and we should just remove the Open Items
 entry.

 Yes. I just removed that entry.

Our practice in past years has been to move items to a separate Resolved
Issues section rather than just delete them.  I fixed the page to look
that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-12 Thread Fujii Masao
On Tue, May 13, 2014 at 1:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Mon, May 12, 2014 at 8:40 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I agree the new behavior is better, and we should just remove the Open Items
 entry.

 Yes. I just removed that entry.

 Our practice in past years has been to move items to a separate Resolved
 Issues section rather than just delete them.  I fixed the page to look
 that way.

Yes. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-05-09 Thread Fujii Masao
On Thu, Mar 20, 2014 at 11:38 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Kyotaro HORIGUCHI escribió:
 Hi, I confirmed that 82233ce7ea4 surely did it.

 At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote
  Fujii Masao escribió:
   On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
   hlinnakan...@vmware.com wrote:
 
9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.
   
Hmm, I don't think we've changed that behavior in 9.4.
  
   ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
   shutdown that way.
 
  Uh, interesting.  I didn't see that secondary effect.  I hope it's not
  for ill?

 The crucial factor for the behavior change is that pmdie has
 become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
 pmdie() ended with ExitPostmaster(0) before the patch but now
 it ends with 'PostmasterStateMachine(); break;' so continues to
 run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
 shutdown).

 After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
 by SIGCHLDs from non-significant processes, then CancelBackup().

 Judging from what was being said on the thread, it seems that running
 CancelBackup() after an immediate shutdown is better than not doing it,
 correct?

This is listed as a 9.4 Open Item, but no one seems to want to revert
this change.
So I'll drop this from the Open Item list barring objections.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-18 Thread Kyotaro HORIGUCHI
Hello,

 I don't think we should consider changing long-established behavior in
 the back-branches.  The old behavior may not be ideal, but that
 doesn't mean it's a bug.

Ok, I understand that. I give it up.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-16 Thread Robert Haas
On Tue, Apr 15, 2014 at 2:52 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, thank you for the discussion.

 At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote
 I don't find that very radical at all.  The backup_label file is
 *supposed* to be removed on the master if it crashes during the
 backup; and it should never be removed from the backup itself.  At
 least that's how I understand it.  Unfortunately, people too often

 The code indeed seems to assume that, and I couldn't think of any
 measure to avoid that dead-end once recovery sequence reads
 backup label accidentially left behind. I thought up to remove
 backup label during immediate shutdown on prvious versions, like
 9.4 does.

 CancelBackup does only stat-unlink-rename sequence so I think
 this doesn't obstruct immediate shutdown sequence. And this
 doesn't change any seeming behavior or interfaces just except for
 this case. What do you think about this? Isn't this also
 applicable for older versions?

I don't think we should consider changing long-established behavior in
the back-branches.  The old behavior may not be ideal, but that
doesn't mean it's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-15 Thread Kyotaro HORIGUCHI
Hello, thank you for the discussion.

At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote
 I don't find that very radical at all.  The backup_label file is
 *supposed* to be removed on the master if it crashes during the
 backup; and it should never be removed from the backup itself.  At
 least that's how I understand it.  Unfortunately, people too often

The code indeed seems to assume that, and I couldn't think of any
measure to avoid that dead-end once recovery sequence reads
backup label accidentially left behind. I thought up to remove
backup label during immediate shutdown on prvious versions, like
9.4 does.

CancelBackup does only stat-unlink-rename sequence so I think
this doesn't obstruct immediate shutdown sequence. And this
doesn't change any seeming behavior or interfaces just except for
this case. What do you think about this? Isn't this also
applicable for older versions?

postmaster.c@9.3.3:2339
pmdie(SIGNAL_ARGS)
{
...
switch (postgres_signal_arg)
{
...
case SIGQUIT:
...
SignalUnconnectedWorkers(SIGQUIT);
+
+/*
+ * Terminate exclusive backup mode. This is done in
+ * PostmasterStateMachine() for other shutdown modes.
+ */
+if (ReachedNormalRunning)
+CancelBackup();
ExitPostmaster(0);
break;


Aside from this, I'll post the new option for pg_resetxlog for
the next CF.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-01 Thread Robert Haas
On Tue, Apr 1, 2014 at 12:39 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Monday, March 31, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Mar 28, 2014 at 1:06 AM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Mmm. I don't think it is relevant to this problem. The problem
  specific here is 'The database was running until just now, but
  shutdown the master (by pacemaker), then restart, won't run
  anymore'. Deleting backup_label after immediate shutdown is the
  radical measure but existing system would be saved by the option.

 I don't find that very radical at all.  The backup_label file is
 *supposed* to be removed on the master if it crashes during the
 backup; and it should never be removed from the backup itself.  At
 least that's how I understand it.  Unfortunately, people too often
 remove the file from the backup and, judging by your report, leave it
 there on the master.

 At first blush it seems pretty radical to me.  Just because the server was
 e-stopped doesn't mean the backup rsync/cp -r/scp etc. isn't still running,
 and it is not clear to me that yanking the backup label file out from under
 it wouldn't cause problems.  I mean, you already have problems if you are
 trying to restore from that backup, but the missing file might make those
 problems less obvious.

 Of course first blush is often wrong, but...

Well, I guess I was thinking mostly of the case where the whole
server's been restarted, in which case none of that stuff is still
running any more.  If there is only a database server crash, then I
agree it's murkier.  Still, you probably ought to kill off those
things if the database server crashes, and then restart the whole base
backup.  Otherwise I think you're going to be in trouble whether the
backup label file sticks around or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-31 Thread Robert Haas
On Fri, Mar 28, 2014 at 1:06 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Mmm. I don't think it is relevant to this problem. The problem
 specific here is 'The database was running until just now, but
 shutdown the master (by pacemaker), then restart, won't run
 anymore'. Deleting backup_label after immediate shutdown is the
 radical measure but existing system would be saved by the option.

I don't find that very radical at all.  The backup_label file is
*supposed* to be removed on the master if it crashes during the
backup; and it should never be removed from the backup itself.  At
least that's how I understand it.  Unfortunately, people too often
remove the file from the backup and, judging by your report, leave it
there on the master.

(We could try to fix this confusion - and thereby confuse all the
people who understand it now - by changing things so that you have to
leave the file there on the master, and remove it from the backup.
Bwahaha!)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-31 Thread Jeff Janes
On Monday, March 31, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Mar 28, 2014 at 1:06 AM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp javascript:; wrote:
  Mmm. I don't think it is relevant to this problem. The problem
  specific here is 'The database was running until just now, but
  shutdown the master (by pacemaker), then restart, won't run
  anymore'. Deleting backup_label after immediate shutdown is the
  radical measure but existing system would be saved by the option.

 I don't find that very radical at all.  The backup_label file is
 *supposed* to be removed on the master if it crashes during the
 backup; and it should never be removed from the backup itself.  At
 least that's how I understand it.  Unfortunately, people too often
 remove the file from the backup and, judging by your report, leave it
 there on the master.


At first blush it seems pretty radical to me.  Just because the server was
e-stopped doesn't mean the backup rsync/cp -r/scp etc. isn't still running,
and it is not clear to me that yanking the backup label file out from under
it wouldn't cause problems.  I mean, you already have problems if you are
trying to restore from that backup, but the missing file might make those
problems less obvious.

Of course first blush is often wrong, but...

Cheers,

Jeff


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-27 Thread Kyotaro HORIGUCHI
Hello,

  After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
  by SIGCHLDs from non-significant processes, then CancelBackup().
 
 Judging from what was being said on the thread, it seems that running
 CancelBackup() after an immediate shutdown is better than not doing it,
 correct?

Agreed. I like that behavior:) It removes backup_label at
immediate shutdown and (altough I didn't see by myself but as far
as I saw PostmasterStateMachine) it would skip shutdown
checkponit.

  Focusing on the point described above, the small patch below
  rewinds the behavior back to 9.3 and before but I don't know the
  appropriateness in regard to the intention of the patch.
 
 I see.  Obviously your patch would, in effect, revert 82233ce7ea
 completely, which is not something we want.  I think if we want to go
 back to the previous behavior of not stopping the backup, some other
 method should be used.

As I mentioned above, I don't want to rewind 9.4's behavior back
to that of previous ones.

What I hope to be realized for now is '-b'(provisional optname)
of pg_resetxlog for at least versions which would fall into this
problem. What do you think about this maybe 'New Feature' but has
meaning practically only for older versions?

Of course I agree with that 'you should erase the backup_label
just after master has crashed' is the most clean and sane way to
*avoid* the situation but the penalty seems a bit too large for
the mistake.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-27 Thread Kyotaro HORIGUCHI
Hello,

  At Wed, 19 Mar 2014 19:34:10 +0900, Fujii Masao wrote
   Agreed. Attached patches do that and I could recover the
   database state with following steps,
 
  Adding new option looks like new feature rather than bug fix.
  I'm afraid that the backpatch of such a change to 9.3 or before
  is not acceptable.
 
  Me too. But on the other hand it simplly is a relief for the
  consequence of the behavior of server (altough it was ill
  operation:), and especially it is needed for at least 9.1 which
  seems cannot be saved without it. Plus it has utterly no impact
  on servers' behavior of any corresponding versions. So I hope it
  is accepted.
 
 Even in 9.1, we can think that problematic situation as database corruption
 and restart the server from the backup which was successfully taken before.
 No?

Mmm. I don't think it is relevant to this problem. The problem
specific here is 'The database was running until just now, but
shutdown the master (by pacemaker), then restart, won't run
anymore'. Deleting backup_label after immediate shutdown is the
radical measure but existing system would be saved by the option.

But, honestly saying, I (also?) don't have sympathy for the
situation so much and if all or most of you think the option can
cause another problem, I won't insist about that any more.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-24 Thread Fujii Masao
On Thu, Mar 20, 2014 at 3:43 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello,

 At Wed, 19 Mar 2014 19:34:10 +0900, Fujii Masao wrote
  Agreed. Attached patches do that and I could recover the
  database state with following steps,

 Adding new option looks like new feature rather than bug fix.
 I'm afraid that the backpatch of such a change to 9.3 or before
 is not acceptable.

 Me too. But on the other hand it simplly is a relief for the
 consequence of the behavior of server (altough it was ill
 operation:), and especially it is needed for at least 9.1 which
 seems cannot be saved without it. Plus it has utterly no impact
 on servers' behavior of any corresponding versions. So I hope it
 is accepted.

Even in 9.1, we can think that problematic situation as database corruption
and restart the server from the backup which was successfully taken before.
No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-20 Thread Kyotaro HORIGUCHI
Hi, I confirmed that 82233ce7ea4 surely did it.

At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote
 Fujii Masao escribió:
  On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
 
   9.4 canceles backup mode even on immediate shutdown so the
   operation causes no problem, but 9.3 and before are doesn't.
  
   Hmm, I don't think we've changed that behavior in 9.4.
  
  ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
  shutdown that way.
 
 Uh, interesting.  I didn't see that secondary effect.  I hope it's not
 for ill?

The crucial factor for the behavior change is that pmdie has
become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
pmdie() ended with ExitPostmaster(0) before the patch but now
it ends with 'PostmasterStateMachine(); break;' so continues to
run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
shutdown).

After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
by SIGCHLDs from non-significant processes, then CancelBackup().

Focusing on the point described above, the small patch below
rewinds the behavior back to 9.3 and before but I don't know the
appropriateness in regard to the intention of the patch.

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index e9072b7..f87c25c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2498,16 +2498,7 @@ pmdie(SIGNAL_ARGS)
(errmsg(received immediate shutdown 
request)));
 
TerminateChildren(SIGQUIT);
-   pmState = PM_WAIT_BACKENDS;
-
-   /* set stopwatch for them to die */
-   AbortStartTime = time(NULL);
-
-   /*
-* Now wait for backends to exit.  If there are none,
-* PostmasterStateMachine will take the next step.
-*/
-   PostmasterStateMachine();
+   ExitPostmaster(0);
break;
}

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-20 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 19 Mar 2014 19:34:10 +0900, Fujii Masao wrote
  Agreed. Attached patches do that and I could recover the
  database state with following steps,
 
 Adding new option looks like new feature rather than bug fix.
 I'm afraid that the backpatch of such a change to 9.3 or before
 is not acceptable.

Me too. But on the other hand it simplly is a relief for the
consequence of the behavior of server (altough it was ill
operation:), and especially it is needed for at least 9.1 which
seems cannot be saved without it. Plus it has utterly no impact
on servers' behavior of any corresponding versions. So I hope it
is accepted.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-20 Thread Kyotaro HORIGUCHI
Hello,

 On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:
  The*problematic*  operation sequence I saw was performed by
  pgsql-RA/Pacemaker. It stops a server already with immediate mode
  and starts the Master as a Standby at first, then
  promote. Focusing on this situation, there would be reasonable to
  reset backup positions.
 
 Well, that's scary. I would suggest doing a fast shutdown instead. But
 if you really want to do an immediate shutdown, you should delete the
 backup_label file after the shutdown

We'd also said them the former thing on several occations. They
answered that they hate shutdown checkpoint to take long time
before shutdown is completed. The latter one has not come on my
mind and seems promising. Thank you for the suggestion.

 When restarting after immediate shutdown and a backup_label file is
 present, the system doesn't know if the system crashed during a
 backup, and it needs to perform crash recovery, or if you're trying
 restore from a backup. It makes a compromise, and starts recovery from
 the checkpoint given in the backup_label, as if it was restoring from
 a backup, but if it doesn't see a backup-end WAL record, it just
 starts up anyway (which would be wrong if you are indeed restoring
 from a backup). But if you create a recovery.conf file, that indicates
 that you are definitely restoring from a backup, so it's more strict
 and insists that the backup-end record must be replayed.
 
  9.4 canceles backup mode even on
  immediate shutdown so the operation causes no problem, but 9.3
  and before are doesn't.
 
 Hmm, I don't think we've changed that behavior in 9.4.

Now pmdie behaves in the similar manner between fast and
immediate shutdown after 82233ce7ea42d6ba519. It is an side
effect of a change on immediate shutdown which make it to wait
the children to die by SIGQUIT.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-20 Thread Alvaro Herrera
Kyotaro HORIGUCHI escribió:
 Hi, I confirmed that 82233ce7ea4 surely did it.
 
 At Wed, 19 Mar 2014 09:35:16 -0300, Alvaro Herrera wrote
  Fujii Masao escribió:
   On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
   hlinnakan...@vmware.com wrote:
  
9.4 canceles backup mode even on immediate shutdown so the
operation causes no problem, but 9.3 and before are doesn't.
   
Hmm, I don't think we've changed that behavior in 9.4.
   
   ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
   shutdown that way.
  
  Uh, interesting.  I didn't see that secondary effect.  I hope it's not
  for ill?
 
 The crucial factor for the behavior change is that pmdie has
 become not to exit immediately for SIGQUIT. 'case SIGQUIT:' in
 pmdie() ended with ExitPostmaster(0) before the patch but now
 it ends with 'PostmasterStateMachine(); break;' so continues to
 run with pmState = PM_WAIT_BACKENDS, similar to SIGINT (fast
 shutdown).
 
 After all, pmState changes to PM_NO_CHILDREN via PM_WAIT_DEAD_END
 by SIGCHLDs from non-significant processes, then CancelBackup().

Judging from what was being said on the thread, it seems that running
CancelBackup() after an immediate shutdown is better than not doing it,
correct?

 Focusing on the point described above, the small patch below
 rewinds the behavior back to 9.3 and before but I don't know the
 appropriateness in regard to the intention of the patch.

I see.  Obviously your patch would, in effect, revert 82233ce7ea
completely, which is not something we want.  I think if we want to go
back to the previous behavior of not stopping the backup, some other
method should be used.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-19 Thread Kyotaro HORIGUCHI
Hello, thank you for suggestions.

The *problematic* operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions. 9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't. Finally, needed amendments per versions
are

9.4: Nothing more is needed (but resetting backup mode by
 resetxlog is acceptable)

9.3: Can be recovered without resetting backup positions in
 controlfile.  (but smarter with it)

9.2: Same to 9.3

9.1: Cannot be recoverd without directly resetting backup
 position in controlfile.  Resetting feature is needed.


At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
 On 03/15/2014 05:59 PM, Fujii Masao wrote:
  What about adding new option into pg_resetxlog so that we can
  reset the pg_control's backup start location? Even after we've
  accidentally entered into the situation that you described, we can
  exit from that by resetting the backup start location in pg_control.
  Also this option seems helpful to salvage the data as a last resort
  from the corrupted backup.
 
 Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
 that the backup end record would arrive any time later. And if it
 does, it won't really do much good after you've reset the WAL.
 
 We probably should just clear out the backup start/stop location
 always when you run pg_resetxlog. Your database is potentially broken
 if you reset the WAL before reaching consistency, but if forcibly do
 that with pg_resetxlog -f, you've been warned.

Agreed. Attached patches do that and I could recover the
database state with following steps,

(1) Remove recovery.conf and do pg_resetxlog -bf
(the option name 'b' would be arguable)
(2) Start the server (with crash recovery)
(3) Stop the server (in any mode)
(4) Create recovery.conf and start the server with archive recovery.

Some annoyance in step 2 and 3 but I don't want to support the
pacemaker's in-a-sense broken sequence no further:(

This is alterable by the following steps suggested in Masao's
previous mail for 9.2 and alter, but 9.1 needs forcibly resetting
startBackupPoint.

At Sun, 16 Mar 2014 00:59:01 +0900, Fujii Masao wrote
 Though this is formal way, you can exit from that situation by
 
 (1) Remove recovery.conf and start the server with crash recovery
 (2) Execute pg_start_backup() after crash recovery ends
 (3) Copy backup_label to somewhere
 (4) Execute pg_stop_backup() and shutdown the server
 (5) Copy backup_label back to $PGDATA
 (6) Create recovery.conf and start the server with archive recovery

This worked for 9.2, 9.3 and HEAD but failed for 9.1 at step 1.

| 2014-03-19 15:53:02.512 JST FATAL:  WAL ends before end of online backup
| 2014-03-19 15:53:02.512 JST HINT:  Online backup started with 
pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that 
point must be available at recovery.

This seems inevitable.

| if (InRecovery 
|   (XLByteLT(EndOfLog, minRecoveryPoint) ||
|!XLogRecPtrIsInvalid(ControlFile-backupStartPoint)))
| {
...
|   /*
|* Ran off end of WAL before reaching end-of-backup WAL record, or
|* minRecoveryPoint.
|*/
|   if (!XLogRecPtrIsInvalid(ControlFile-backupStartPoint))
|   ereport(FATAL,
|   (errmsg(WAL ends before end of online backup),

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 28a4f19..7d9cf6d 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -85,6 +85,7 @@ main(int argc, char *argv[])
 	int			c;
 	bool		force = false;
 	bool		noupdate = false;
+	bool		resetbackuppos = false;
 	MultiXactId set_oldestmxid = 0;
 	char	   *endptr;
 	char	   *endptr2;
@@ -110,7 +111,7 @@ main(int argc, char *argv[])
 	}
 
 
-	while ((c = getopt(argc, argv, fl:m:no:O:x:e:)) != -1)
+	while ((c = getopt(argc, argv, fl:m:no:O:x:e:b)) != -1)
 	{
 		switch (c)
 		{
@@ -122,6 +123,10 @@ main(int argc, char *argv[])
 noupdate = true;
 break;
 
+			case 'b':
+resetbackuppos = true;
+break;
+
 			case 'e':
 set_xid_epoch = strtoul(optarg, endptr, 0);
 if (endptr == optarg || *endptr != '\0')
@@ -350,6 +355,13 @@ main(int argc, char *argv[])
 		ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;
 	}
 
+	if (resetbackuppos)
+	{
+		ControlFile.backupStartPoint = InvalidXLogRecPtr;
+		ControlFile.backupEndPoint = InvalidXLogRecPtr;
+		ControlFile.backupEndRequired = false;
+	}
+
 	if (minXlogSegNo  newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
@@ -1098,6 +1110,7 @@ usage(void)
 	printf(_(  -O OFFSETset next multitransaction offset\n));
 	printf(_(  -V, --versionoutput version information, 

Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-19 Thread Fujii Masao
On Wed, Mar 19, 2014 at 5:28 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, thank you for suggestions.

 The *problematic* operation sequence I saw was performed by
 pgsql-RA/Pacemaker. It stops a server already with immediate mode
 and starts the Master as a Standby at first, then
 promote. Focusing on this situation, there would be reasonable to
 reset backup positions. 9.4 canceles backup mode even on
 immediate shutdown so the operation causes no problem, but 9.3
 and before are doesn't. Finally, needed amendments per versions
 are

 9.4: Nothing more is needed (but resetting backup mode by
  resetxlog is acceptable)

 9.3: Can be recovered without resetting backup positions in
  controlfile.  (but smarter with it)

 9.2: Same to 9.3

 9.1: Cannot be recoverd without directly resetting backup
  position in controlfile.  Resetting feature is needed.


 At Mon, 17 Mar 2014 15:59:09 +0200, Heikki Linnakangas wrote
 On 03/15/2014 05:59 PM, Fujii Masao wrote:
  What about adding new option into pg_resetxlog so that we can
  reset the pg_control's backup start location? Even after we've
  accidentally entered into the situation that you described, we can
  exit from that by resetting the backup start location in pg_control.
  Also this option seems helpful to salvage the data as a last resort
  from the corrupted backup.

 Yeah, seems reasonable. After you run pg_resetxlog, there's no hope
 that the backup end record would arrive any time later. And if it
 does, it won't really do much good after you've reset the WAL.

 We probably should just clear out the backup start/stop location
 always when you run pg_resetxlog. Your database is potentially broken
 if you reset the WAL before reaching consistency, but if forcibly do
 that with pg_resetxlog -f, you've been warned.

 Agreed. Attached patches do that and I could recover the
 database state with following steps,

Adding new option looks like new feature rather than bug fix.
I'm afraid that the backpatch of such a change to 9.3 or before
is not acceptable.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-19 Thread Heikki Linnakangas

On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:

The*problematic*  operation sequence I saw was performed by
pgsql-RA/Pacemaker. It stops a server already with immediate mode
and starts the Master as a Standby at first, then
promote. Focusing on this situation, there would be reasonable to
reset backup positions.


Well, that's scary. I would suggest doing a fast shutdown instead. But 
if you really want to do an immediate shutdown, you should delete the 
backup_label file after the shutdown


When restarting after immediate shutdown and a backup_label file is 
present, the system doesn't know if the system crashed during a backup, 
and it needs to perform crash recovery, or if you're trying restore from 
a backup. It makes a compromise, and starts recovery from the checkpoint 
given in the backup_label, as if it was restoring from a backup, but if 
it doesn't see a backup-end WAL record, it just starts up anyway (which 
would be wrong if you are indeed restoring from a backup). But if you 
create a recovery.conf file, that indicates that you are definitely 
restoring from a backup, so it's more strict and insists that the 
backup-end record must be replayed.



9.4 canceles backup mode even on
immediate shutdown so the operation causes no problem, but 9.3
and before are doesn't.


Hmm, I don't think we've changed that behavior in 9.4.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-19 Thread Fujii Masao
On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 03/19/2014 10:28 AM, Kyotaro HORIGUCHI wrote:

 The*problematic*  operation sequence I saw was performed by

 pgsql-RA/Pacemaker. It stops a server already with immediate mode
 and starts the Master as a Standby at first, then
 promote. Focusing on this situation, there would be reasonable to
 reset backup positions.


 Well, that's scary. I would suggest doing a fast shutdown instead. But if
 you really want to do an immediate shutdown, you should delete the
 backup_label file after the shutdown

 When restarting after immediate shutdown and a backup_label file is present,
 the system doesn't know if the system crashed during a backup, and it needs
 to perform crash recovery, or if you're trying restore from a backup. It
 makes a compromise, and starts recovery from the checkpoint given in the
 backup_label, as if it was restoring from a backup, but if it doesn't see a
 backup-end WAL record, it just starts up anyway (which would be wrong if you
 are indeed restoring from a backup). But if you create a recovery.conf file,
 that indicates that you are definitely restoring from a backup, so it's more
 strict and insists that the backup-end record must be replayed.


 9.4 canceles backup mode even on
 immediate shutdown so the operation causes no problem, but 9.3
 and before are doesn't.


 Hmm, I don't think we've changed that behavior in 9.4.

ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
shutdown that way.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-19 Thread Alvaro Herrera
Fujii Masao escribió:
 On Wed, Mar 19, 2014 at 7:57 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

  9.4 canceles backup mode even on immediate shutdown so the
  operation causes no problem, but 9.3 and before are doesn't.
 
  Hmm, I don't think we've changed that behavior in 9.4.
 
 ISTM 82233ce7ea42d6ba519aaec63008aff49da6c7af changed immdiate
 shutdown that way.

Uh, interesting.  I didn't see that secondary effect.  I hope it's not
for ill?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-17 Thread Heikki Linnakangas

On 03/15/2014 05:59 PM, Fujii Masao wrote:

What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.


Yeah, seems reasonable. After you run pg_resetxlog, there's no hope that 
the backup end record would arrive any time later. And if it does, it 
won't really do much good after you've reset the WAL.


We probably should just clear out the backup start/stop location always 
when you run pg_resetxlog. Your database is potentially broken if you 
reset the WAL before reaching consistency, but if forcibly do that with 
pg_resetxlog -f, you've been warned.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-16 Thread Kyotaro HORIGUCHI
Thank you for good suggestion.

  What the mess is once entering this situation, I could find no
  formal operation to exit from it.
 
 Though this is formal way, you can exit from that situation by
 
 (1) Remove recovery.conf and start the server with crash recovery
 (2) Execute pg_start_backup() after crash recovery ends
 (3) Copy backup_label to somewhere
 (4) Execute pg_stop_backup() and shutdown the server
 (5) Copy backup_label back to $PGDATA
 (6) Create recovery.conf and start the server with archive recovery

It will do. And pg_resetxlog was the first thing I checked out
for reseting backupStartPoint.

 What about adding new option into pg_resetxlog so that we can
 reset the pg_control's backup start location? Even after we've
 accidentally entered into the situation that you described, we can
 exit from that by resetting the backup start location in pg_control.
 Also this option seems helpful to salvage the data as a last resort
 from the corrupted backup.

It is in far better proportion than recovery.conf option:), since
it is already warned to be dangerous as its nature. Anyway I'll
make sure the situation under the trouble fist.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-16 Thread Kyotaro HORIGUCHI
Hello, very sorry to have bothered you by silly question.

me It is in far better proportion than recovery.conf option:), since
me it is already warned to be dangerous as its nature. Anyway I'll
me make sure the situation under the trouble fist.

It looks exactly the 'starting up as standby of ex-master which
crashed during backup mode' case as I checked out the original
issue. I agree that no save is needed for the case since it is
simply a db corruption. Usefulness of pg_resetxlog's
resetting-backup_label-related-items feature is not clear so far,
so I don't wish it realised for this time.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-15 Thread Fujii Masao
On Fri, Mar 14, 2014 at 7:32 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, we found that postgreql won't complete archive recovery
 foever on some situation. This occurs HEAD, 9.3.3, 9.2.7, 9.1.12.

 Restarting server with archive recovery fails as following just
 after it was killed with SIGKILL after pg_start_backup and some
 wal writes but before pg_stop_backup.

 | FATAL:  WAL ends before end of online backup
 | HINT: Online backup started with pg_start_backup() must be
 |  ended with pg_stop_backup(), and all WAL up to that point must
 |  be available at recovery.

 What the mess is once entering this situation, I could find no
 formal operation to exit from it.

Though this is formal way, you can exit from that situation by

(1) Remove recovery.conf and start the server with crash recovery
(2) Execute pg_start_backup() after crash recovery ends
(3) Copy backup_label to somewhere
(4) Execute pg_stop_backup() and shutdown the server
(5) Copy backup_label back to $PGDATA
(6) Create recovery.conf and start the server with archive recovery

 On this situation, 'Backup start location' in controldata has
 some valid location but corresponding 'end of backup' WAL record
 won't come forever.

 But I think PG cannot tell the situation dintinctly whether the
 'end of backup' reocred is not exists at all or it will come
 later especially when the server starts as a streaming
 replication hot-standby.

 One solution for it would be a new parameter in recovery.conf
 which tells that the operator wants the server to start as if
 there were no backup label ever before when the situation
 comes. It looks ugly and somewhat danger but seems necessary.

 The first attached file is the script to replay the problem, and
 the second is the patch trying to do what is described above.

 After applying this patch on HEAD and uncommneting the
 'cancel_backup_label_on_failure = true' in test.sh, the test
 script runs as following,

 | LOG:  record with zero length at 0/2010F40
 | WARNING:  backup_label was canceled.
 | HINT:  server might have crashed during backup mode.
 | LOG:  consistent recovery state reached at 0/2010F40
 | LOG:  redo done at 0/2010DA0

 What do you thing about this?

What about adding new option into pg_resetxlog so that we can
reset the pg_control's backup start location? Even after we've
accidentally entered into the situation that you described, we can
exit from that by resetting the backup start location in pg_control.
Also this option seems helpful to salvage the data as a last resort
from the corrupted backup.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Heikki Linnakangas

On 03/14/2014 12:32 PM, Kyotaro HORIGUCHI wrote:

Restarting server with archive recovery fails as following just
after it was killed with SIGKILL after pg_start_backup and some
wal writes but before pg_stop_backup.

| FATAL:  WAL ends before end of online backup
| HINT: Online backup started with pg_start_backup() must be
|  ended with pg_stop_backup(), and all WAL up to that point must
|  be available at recovery.

What the mess is once entering this situation, I could find no
formal operation to exit from it.

On this situation, 'Backup start location' in controldata has
some valid location but corresponding 'end of backup' WAL record
won't come forever.

But I think PG cannot tell the situation dintinctly whether the
'end of backup' reocred is not exists at all or it will come
later especially when the server starts as a streaming
replication hot-standby.


If you kill the server while a backup is in progress, the backup is 
broken. It's correct that the server refuses to start up from the broken 
backup. So basically, don't do that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Kyotaro HORIGUCHI
Thank you.

2014/03/14 19:42 Heikki Linnakangas hlinnakan...@vmware.com:

 On 03/14/2014 12:32 PM, Kyotaro HORIGUCHI wrote:

 Restarting server with archive recovery fails as following just
 after it was killed with SIGKILL after pg_start_backup and some
 wal writes but before pg_stop_backup.

 | FATAL:  WAL ends before end of online backup
 | HINT: Online backup started with pg_start_backup() must be
 |  ended with pg_stop_backup(), and all WAL up to that point must
 |  be available at recovery.

 What the mess is once entering this situation, I could find no
 formal operation to exit from it.


 If you kill the server while a backup is in progress, the backup is
broken. It's correct that the server refuses to start up from the broken
backup. So basically, don't do that.

Hmm.. What I did is simplly restarting server  just after a crash but the
server was accidentially in backup mode. No backup copy is used. Basically,
the server is in the same situation with the simple restart after crash.
The difference here is the restarting made the database completly useless
while it had been not. I wish to save the database for the case and I
suppose it so acceptable.

regards,
-- 
Kyotaro Horiguchi
NTT Opensource Software Center


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Kyotaro HORIGUCHI
Sorry, I  wrote a little wrong.

2014/03/14 20:24 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp:
 I wish to save the database for the case and I suppose it so acceptable.

and I don't suppose it so unacceptable.

regards,
-- 
Kyotaro Horiguchi
NTT Opensource Software Center


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Heikki Linnakangas

On 03/14/2014 01:24 PM, Kyotaro HORIGUCHI wrote:

Hmm.. What I did is simplly restarting server  just after a crash but the
server was accidentially in backup mode. No backup copy is used. Basically,
the server is in the same situation with the simple restart after crash.


You created recovery.conf in the master server after crash. Just don't 
do that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Kyotaro HORIGUCHI
Hello,

2014/03/14 20:51 Heikki Linnakangas hlinnakan...@vmware.com:
 You created recovery.conf in the master server after crash. Just don't do
that.

Ah, ok. I understood what you meant.
Sorry that I can't confirm rihgt now, the original issue should occur on
the standby. I might've oversimplicated.

regards,
-- 
Kyotaro Horiguchi
NTT Opensource Software Center


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-03-14 Thread Kyotaro HORIGUCHI
Umm.. Sorry for repeated correction.

2014/03/14 21:12 Kyotaro HORIGUCHI kyota.horigu...@gmail.com:

 Ah, ok. I understood what you meant.
 Sorry that I can't confirm rihgt now, the original issue should occur on
the standby.

The original issue should have occurred on standby

 I might've oversimplicated.

 regards,
 --
 Kyotaro Horiguchi
 NTT Opensource Software Center