Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-28 Thread Alvaro Herrera
MauMau escribió: Hi, I did this. Please find attached the revised patch. I modified HandleChildCrash(). I tested the immediate shutdown, and the child cleanup succeeded. Thanks, committed. There are two matters pending here: 1. do we want postmaster to exit immediately after sending the

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-28 Thread Robert Haas
On Fri, Jun 28, 2013 at 6:00 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: MauMau escribió: Hi, I did this. Please find attached the revised patch. I modified HandleChildCrash(). I tested the immediate shutdown, and the child cleanup succeeded. Thanks, committed. There are two

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-27 Thread MauMau
Hi, Alvaro san, From: Alvaro Herrera alvhe...@2ndquadrant.com MauMau escribió: Yeah, I see that --- after removing that early exit, there are unwanted messages. And in fact there are some signals sent that weren't previously sent. Clearly we need something here: if we're in immediate shutdown

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-25 Thread MauMau
From: Alvaro Herrera alvhe...@2ndquadrant.com Yeah, I see that --- after removing that early exit, there are unwanted messages. And in fact there are some signals sent that weren't previously sent. Clearly we need something here: if we're in immediate shutdown handler, don't signal anyone

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-24 Thread Alvaro Herrera
MauMau escribió: From: Alvaro Herrera alvhe...@2ndquadrant.com Actually, in further testing I noticed that the fast-path you introduced in BackendCleanup (or was it HandleChildCrash?) in the immediate shutdown case caused postmaster to fail to clean up properly after sending the SIGKILL

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-22 Thread Alvaro Herrera
MauMau escribió: Are you suggesting simplifying the following part in ServerLoop()? I welcome the idea if this condition becomes simpler. However, I cannot imagine how. if (AbortStartTime 0 /* SIGKILL only once */ (Shutdown == ImmediateShutdown || (FatalError !SendStop)) now -

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-22 Thread Robert Haas
On Fri, Jun 21, 2013 at 10:02 PM, MauMau maumau...@gmail.com wrote: I'm comfortable with 5 seconds. We are talking about the interval between sending SIGQUIT to the children and then sending SIGKILL to them. In most situations, the backends should terminate immediately. However, as I said a

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-22 Thread MauMau
From: Alvaro Herrera alvhe...@2ndquadrant.com MauMau escribió: I thought of adding some new state of pmState for some reason (that might be the same as your idea). But I refrained from doing that, because pmState has already many states. I was afraid adding a new pmState value for this bug fix

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-22 Thread MauMau
From: Robert Haas robertmh...@gmail.com On Fri, Jun 21, 2013 at 10:02 PM, MauMau maumau...@gmail.com wrote: I'm comfortable with 5 seconds. We are talking about the interval between sending SIGQUIT to the children and then sending SIGKILL to them. In most situations, the backends should

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Hitoshi Harada
On Thu, Jun 20, 2013 at 3:40 PM, MauMau maumau...@gmail.com wrote: Here, reliable means that the database server is certainly shut down when pg_ctl returns, not telling a lie that I shut down the server processes for you, so you do not have to be worried that some postgres process might

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau
From: Alvaro Herrera alvhe...@2ndquadrant.com MauMau escribió: One concern is that umount would fail in such a situation because postgres has some open files on the filesystem, which is on the shared disk in case of traditional HA cluster. See my reply to Noah. If postmaster stays around,

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau
From: Alvaro Herrera alvhe...@2ndquadrant.com Actually, I think it would be cleaner to have a new state in pmState, namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When we're in this state, postmaster is only waiting for the timeout to expire; and when it does, it sends SIGKILL

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Robert Haas
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I will go with 5 seconds, then. I'm uncomfortable with this whole concept, and particularly with such a short timeout. On a very busy system, things can take a LOT longer than they think we should; it can take 30

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: More generally, what do we think the point is of sending SIGQUIT rather than SIGKILL in the first place, and why does that point cease to be valid after 5 seconds? Well, mostly it's about telling the client we're committing hara-kiri. Without that,

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Robert Haas
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: More generally, what do we think the point is of sending SIGQUIT rather than SIGKILL in the first place, and why does that point cease to be valid after 5 seconds? Well, mostly it's

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread Christopher Browne
The case where I wanted routine shutdown immediate (and I'm not sure I ever actually got it) was when we were using IBM HA/CMP, where I wanted a terminate with a fair bit of prejudice. If we know we want to switch right away now, immediate seemed pretty much right. I was fine with interrupting

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau
From: Robert Haas robertmh...@gmail.com On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I will go with 5 seconds, then. I'm uncomfortable with this whole concept, and particularly with such a short timeout. On a very busy system, things can take a LOT longer

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-21 Thread MauMau
From: Robert Haas robertmh...@gmail.com On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: More generally, what do we think the point is of sending SIGQUIT rather than SIGKILL in the first place, and why does that point cease to be

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-20 Thread MauMau
First, thank you for the review. From: Alvaro Herrera alvhe...@2ndquadrant.com This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15. Is there a rationale behind the 10? If we said 60, that would fit perfectly well within the already existing 60-second loop in postmaster, but

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-20 Thread Alvaro Herrera
MauMau escribió: First, thank you for the review. From: Alvaro Herrera alvhe...@2ndquadrant.com This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15. Is there a rationale behind the 10? If we said 60, that would fit perfectly well within the already existing 60-second

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-20 Thread MauMau
From: Alvaro Herrera alvhe...@2ndquadrant.com I will go with 5 seconds, then. OK, I agree. My point is that there is no difference. For one thing, once we enter immediate shutdown state, and sigkill has been sent, no further action is taken. Postmaster will just sit there indefinitely

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-20 Thread Alvaro Herrera
MauMau escribió: From: Alvaro Herrera alvhe...@2ndquadrant.com One concern is that umount would fail in such a situation because postgres has some open files on the filesystem, which is on the shared disk in case of traditional HA cluster. See my reply to Noah. If postmaster stays around,

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-20 Thread Alvaro Herrera
Actually, I think it would be cleaner to have a new state in pmState, namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When we're in this state, postmaster is only waiting for the timeout to expire; and when it does, it sends SIGKILL and exits. Pretty much the same you have,

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-06-19 Thread Alvaro Herrera
MauMau escribió: Could you review the patch? The summary of the change is: 1. postmaster waits for children to terminate when it gets an immediate shutdown request, instead of exiting. 2. postmaster sends SIGKILL to remaining children if all of the child processes do not terminate within

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-02-07 Thread MauMau
Hello, Tom-san, folks, From: Tom Lane t...@sss.pgh.pa.us I think if we want to make it bulletproof we'd have to do what the OP suggested and switch to SIGKILL. I'm not enamored of that for the reasons I mentioned --- but one idea that might dodge the disadvantages is to have the postmaster

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-02-01 Thread Andres Freund
On 2013-01-22 22:19:25 -0500, Tom Lane wrote: Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on Monday Feb 4 for public announcement Thursday Feb 7. If

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-02-01 Thread Peter Eisentraut
On 1/31/13 5:42 PM, MauMau wrote: Thank you for sharing your experience. So you also considered making postmaster SIGKILL children like me, didn't you? I bet most of people who encounter this problem would feel like that. It is definitely pg_ctl who needs to be prepared, not the users. It

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-02-01 Thread Andres Freund
On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote: On 1/31/13 5:42 PM, MauMau wrote: Thank you for sharing your experience. So you also considered making postmaster SIGKILL children like me, didn't you? I bet most of people who encounter this problem would feel like that. It is

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-02-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote: I found an old patch that I had prepared for this, which I have attached. YMMV. +static void +quickdie_alarm_handler(SIGNAL_ARGS) +{ +/* + * We got here if ereport() was blocking,

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-31 Thread Peter Eisentraut
On 1/30/13 9:11 AM, MauMau wrote: When I ran pg_ctl stop -mi against the primary, some applications connected to the primary did not stop. The cause was that the backends was deadlocked in quickdie() with some call stack like the following. I'm sorry to have left the stack trace file on the

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-31 Thread MauMau
From: Peter Eisentraut pete...@gmx.net On 1/30/13 9:11 AM, MauMau wrote: When I ran pg_ctl stop -mi against the primary, some applications connected to the primary did not stop. The cause was that the backends was deadlocked in quickdie() with some call stack like the following. I'm sorry to

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-31 Thread Kevin Grittner
MauMau maumau...@gmail.com wrote: Just doing pkill postgres will unexpectedly terminate postgres of other instances. Not if you run each instance under a different OS user, and execute pkill with the right user.  (Never use root for that!)  This is just one of the reasons that you should not

backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread MauMau
From: Tom Lane t...@sss.pgh.pa.us Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on Monday Feb 4 for public announcement Thursday Feb 7. If you're aware of

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread Tom Lane
MauMau maumau...@gmail.com writes: When I ran pg_ctl stop -mi against the primary, some applications connected to the primary did not stop. ... The root cause is that gettext() is called in the signal handler quickdie() via errhint(). Yeah, it's a known hazard that quickdie() operates like

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread Andres Freund
On 2013-01-30 10:23:09 -0500, Tom Lane wrote: MauMau maumau...@gmail.com writes: When I ran pg_ctl stop -mi against the primary, some applications connected to the primary did not stop. ... The root cause is that gettext() is called in the signal handler quickdie() via errhint().

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2013-01-30 10:23:09 -0500, Tom Lane wrote: Yeah, it's a known hazard that quickdie() operates like that. What about not translating those? The messages are static and all memory needed by postgres should be pre-allocated. That would reduce our

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread MauMau
From: Tom Lane t...@sss.pgh.pa.us MauMau maumau...@gmail.com writes: I think the solution is the typical one. That is, to just remember the receipt of SIGQUIT by setting a global variable and call siglongjmp() in quickdie(), and perform tasks currently done in quickdie() when sigsetjmp()

Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)

2013-01-30 Thread Tom Lane
MauMau maumau...@gmail.com writes: From: Tom Lane t...@sss.pgh.pa.us The long and the short of it is that SIGQUIT is the emergency-stop panic button. You don't use it for routine shutdowns --- you use it when there is a damn good reason to and you're prepared to do some manual cleanup if

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-29 Thread Fujii Masao
On Sun, Jan 27, 2013 at 11:38 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote: Although you said the fix will solve my problem, I don't feel it will. The discussion is about the crash when the

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-27 Thread Fujii Masao
On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote: I'm wondering if the fix discussed in the above thread solves my problem. I found the following differences between

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-27 Thread MauMau
From: Fujii Masao masao.fu...@gmail.com On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote: Although you said the fix will solve my problem, I don't feel it will. The discussion is about the crash when the standby restarts after the primary vacuums and truncates a table. On

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-26 Thread MauMau
From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote: I'm wondering if the fix discussed in the above thread solves my problem. I found the following differences between Horiguchi-san's case and my case: (1) Horiguchi-san says the bug

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread MauMau
From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote: I searched through PostgreSQL mailing lists with WAL contains references to invalid pages, and i found 19 messages. Some people encountered similar problem. There were some discussions

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread Fujii Masao
On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote: I searched through PostgreSQL mailing lists with WAL contains references to invalid pages, and i found 19 messages.

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-23 Thread MauMau
From: Tom Lane t...@sss.pgh.pa.us Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on Monday Feb 4 for public announcement Thursday Feb 7. If you're aware of

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-23 Thread Fujii Masao
On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote: From: Tom Lane t...@sss.pgh.pa.us Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-22 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on Monday Feb 4 for public announcement Thursday Feb 7. If you're

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-22 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Since we've fixed a couple of relatively nasty bugs recently, the core committee has determined that it'd be a good idea to push out PG update releases soon. The current plan is to wrap on Monday Feb 4 for public