MauMau escribió:
Hi,
I did this. Please find attached the revised patch. I modified
HandleChildCrash(). I tested the immediate shutdown, and the child
cleanup succeeded.
Thanks, committed.
There are two matters pending here:
1. do we want postmaster to exit immediately after sending the
On Fri, Jun 28, 2013 at 6:00 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
MauMau escribió:
Hi,
I did this. Please find attached the revised patch. I modified
HandleChildCrash(). I tested the immediate shutdown, and the child
cleanup succeeded.
Thanks, committed.
There are two
Hi, Alvaro san,
From: Alvaro Herrera alvhe...@2ndquadrant.com
MauMau escribió:
Yeah, I see that --- after removing that early exit, there are unwanted
messages. And in fact there are some signals sent that weren't
previously sent. Clearly we need something here: if we're in immediate
shutdown
From: Alvaro Herrera alvhe...@2ndquadrant.com
Yeah, I see that --- after removing that early exit, there are unwanted
messages. And in fact there are some signals sent that weren't
previously sent. Clearly we need something here: if we're in immediate
shutdown handler, don't signal anyone
MauMau escribió:
From: Alvaro Herrera alvhe...@2ndquadrant.com
Actually, in further testing I noticed that the fast-path you introduced
in BackendCleanup (or was it HandleChildCrash?) in the immediate
shutdown case caused postmaster to fail to clean up properly after
sending the SIGKILL
MauMau escribió:
Are you suggesting simplifying the following part in ServerLoop()?
I welcome the idea if this condition becomes simpler. However, I
cannot imagine how.
if (AbortStartTime 0 /* SIGKILL only once */
(Shutdown == ImmediateShutdown || (FatalError !SendStop))
now -
On Fri, Jun 21, 2013 at 10:02 PM, MauMau maumau...@gmail.com wrote:
I'm comfortable with 5 seconds. We are talking about the interval between
sending SIGQUIT to the children and then sending SIGKILL to them. In most
situations, the backends should terminate immediately. However, as I said a
From: Alvaro Herrera alvhe...@2ndquadrant.com
MauMau escribió:
I thought of adding some new state of pmState for some reason (that
might be the same as your idea).
But I refrained from doing that, because pmState has already many
states. I was afraid adding a new pmState value for this bug fix
From: Robert Haas robertmh...@gmail.com
On Fri, Jun 21, 2013 at 10:02 PM, MauMau maumau...@gmail.com wrote:
I'm comfortable with 5 seconds. We are talking about the interval
between
sending SIGQUIT to the children and then sending SIGKILL to them. In
most
situations, the backends should
On Thu, Jun 20, 2013 at 3:40 PM, MauMau maumau...@gmail.com wrote:
Here, reliable means that the database server is certainly shut
down when pg_ctl returns, not telling a lie that I shut down the
server processes for you, so you do not have to be worried that some
postgres process might
From: Alvaro Herrera alvhe...@2ndquadrant.com
MauMau escribió:
One concern is that umount would fail in such a situation because
postgres has some open files on the filesystem, which is on the
shared disk in case of traditional HA cluster.
See my reply to Noah. If postmaster stays around,
From: Alvaro Herrera alvhe...@2ndquadrant.com
Actually, I think it would be cleaner to have a new state in pmState,
namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When
we're in this state, postmaster is only waiting for the timeout to
expire; and when it does, it sends SIGKILL
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
I will go with 5 seconds, then.
I'm uncomfortable with this whole concept, and particularly with such
a short timeout. On a very busy system, things can take a LOT longer
than they think we should; it can take 30
Robert Haas robertmh...@gmail.com writes:
More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be valid after 5 seconds?
Well, mostly it's about telling the client we're committing hara-kiri.
Without that,
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be valid after 5 seconds?
Well, mostly it's
The case where I wanted routine shutdown immediate (and I'm not sure I
ever actually got it) was when we were using IBM HA/CMP, where I wanted a
terminate with a fair bit of prejudice.
If we know we want to switch right away now, immediate seemed pretty much
right. I was fine with interrupting
From: Robert Haas robertmh...@gmail.com
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
I will go with 5 seconds, then.
I'm uncomfortable with this whole concept, and particularly with such
a short timeout. On a very busy system, things can take a LOT longer
From: Robert Haas robertmh...@gmail.com
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be
First, thank you for the review.
From: Alvaro Herrera alvhe...@2ndquadrant.com
This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15.
Is there a rationale behind the 10? If we said 60, that would fit
perfectly well within the already existing 60-second loop in postmaster,
but
MauMau escribió:
First, thank you for the review.
From: Alvaro Herrera alvhe...@2ndquadrant.com
This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15.
Is there a rationale behind the 10? If we said 60, that would fit
perfectly well within the already existing 60-second
From: Alvaro Herrera alvhe...@2ndquadrant.com
I will go with 5 seconds, then.
OK, I agree.
My point is that there is no difference. For one thing, once we enter
immediate shutdown state, and sigkill has been sent, no further action
is taken. Postmaster will just sit there indefinitely
MauMau escribió:
From: Alvaro Herrera alvhe...@2ndquadrant.com
One concern is that umount would fail in such a situation because
postgres has some open files on the filesystem, which is on the
shared disk in case of traditional HA cluster.
See my reply to Noah. If postmaster stays around,
Actually, I think it would be cleaner to have a new state in pmState,
namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When
we're in this state, postmaster is only waiting for the timeout to
expire; and when it does, it sends SIGKILL and exits. Pretty much the
same you have,
MauMau escribió:
Could you review the patch? The summary of the change is:
1. postmaster waits for children to terminate when it gets an
immediate shutdown request, instead of exiting.
2. postmaster sends SIGKILL to remaining children if all of the
child processes do not terminate within
Hello, Tom-san, folks,
From: Tom Lane t...@sss.pgh.pa.us
I think if we want to make it bulletproof we'd have to do what the
OP suggested and switch to SIGKILL. I'm not enamored of that for the
reasons I mentioned --- but one idea that might dodge the disadvantages
is to have the postmaster
On 2013-01-22 22:19:25 -0500, Tom Lane wrote:
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If
On 1/31/13 5:42 PM, MauMau wrote:
Thank you for sharing your experience. So you also considered making
postmaster SIGKILL children like me, didn't you? I bet most of people
who encounter this problem would feel like that.
It is definitely pg_ctl who needs to be prepared, not the users. It
On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote:
On 1/31/13 5:42 PM, MauMau wrote:
Thank you for sharing your experience. So you also considered making
postmaster SIGKILL children like me, didn't you? I bet most of people
who encounter this problem would feel like that.
It is
Andres Freund and...@2ndquadrant.com writes:
On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote:
I found an old patch that I had prepared for this, which I have
attached. YMMV.
+static void
+quickdie_alarm_handler(SIGNAL_ARGS)
+{
+/*
+ * We got here if ereport() was blocking,
On 1/30/13 9:11 AM, MauMau wrote:
When I ran pg_ctl stop -mi against the primary, some applications
connected to the primary did not stop. The cause was that the backends
was deadlocked in quickdie() with some call stack like the following.
I'm sorry to have left the stack trace file on the
From: Peter Eisentraut pete...@gmx.net
On 1/30/13 9:11 AM, MauMau wrote:
When I ran pg_ctl stop -mi against the primary, some applications
connected to the primary did not stop. The cause was that the backends
was deadlocked in quickdie() with some call stack like the following.
I'm sorry to
MauMau maumau...@gmail.com wrote:
Just doing pkill postgres will unexpectedly terminate postgres
of other instances.
Not if you run each instance under a different OS user, and execute
pkill with the right user. (Never use root for that!) This is
just one of the reasons that you should not
From: Tom Lane t...@sss.pgh.pa.us
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're aware of
MauMau maumau...@gmail.com writes:
When I ran pg_ctl stop -mi against the primary, some applications
connected to the primary did not stop. ...
The root cause is that gettext() is called in the signal handler quickdie()
via errhint().
Yeah, it's a known hazard that quickdie() operates like
On 2013-01-30 10:23:09 -0500, Tom Lane wrote:
MauMau maumau...@gmail.com writes:
When I ran pg_ctl stop -mi against the primary, some applications
connected to the primary did not stop. ...
The root cause is that gettext() is called in the signal handler quickdie()
via errhint().
Andres Freund and...@2ndquadrant.com writes:
On 2013-01-30 10:23:09 -0500, Tom Lane wrote:
Yeah, it's a known hazard that quickdie() operates like that.
What about not translating those? The messages are static and all memory
needed by postgres should be pre-allocated.
That would reduce our
From: Tom Lane t...@sss.pgh.pa.us
MauMau maumau...@gmail.com writes:
I think the solution is the typical one. That is, to just remember the
receipt of SIGQUIT by setting a global variable and call siglongjmp() in
quickdie(), and perform tasks currently done in quickdie() when
sigsetjmp()
MauMau maumau...@gmail.com writes:
From: Tom Lane t...@sss.pgh.pa.us
The long and the short of it is that SIGQUIT is the emergency-stop panic
button. You don't use it for routine shutdowns --- you use it when
there is a damn good reason to and you're prepared to do some manual
cleanup if
On Sun, Jan 27, 2013 at 11:38 PM, MauMau maumau...@gmail.com wrote:
From: Fujii Masao masao.fu...@gmail.com
On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote:
Although you said the fix will solve my problem, I don't feel it will.
The
discussion is about the crash when the
On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote:
From: Fujii Masao masao.fu...@gmail.com
On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote:
I'm wondering if the fix discussed in the above thread solves my problem.
I
found the following differences between
From: Fujii Masao masao.fu...@gmail.com
On Sun, Jan 27, 2013 at 12:17 AM, MauMau maumau...@gmail.com wrote:
Although you said the fix will solve my problem, I don't feel it will.
The
discussion is about the crash when the standby restarts after the
primary
vacuums and truncates a table. On
From: Fujii Masao masao.fu...@gmail.com
On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote:
I'm wondering if the fix discussed in the above thread solves my problem.
I
found the following differences between Horiguchi-san's case and my case:
(1)
Horiguchi-san says the bug
From: Fujii Masao masao.fu...@gmail.com
On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote:
I searched through PostgreSQL mailing lists with WAL contains references
to
invalid pages, and i found 19 messages. Some people encountered similar
problem. There were some discussions
On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote:
From: Fujii Masao masao.fu...@gmail.com
On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote:
I searched through PostgreSQL mailing lists with WAL contains references
to
invalid pages, and i found 19 messages.
From: Tom Lane t...@sss.pgh.pa.us
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're aware of
On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote:
From: Tom Lane t...@sss.pgh.pa.us
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're
Stephen Frost sfr...@snowman.net writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
48 matches
Mail list logo