Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-02-19 Thread Robert Haas
On Mon, Feb 17, 2014 at 11:29 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 The pg_regress part is ugly.  However, pg_regress is doing something
 unusual when starting postmaster itself, so the ugly coding to stop it
 seems to match.  If we wanted to avoid the ugliness here, the right fix
 would be to use pg_ctl to start postmaster as well as to stop it.

I wonder if this would change the behavior in cases where we hit ^C
during the regression tests.  Right now I think that kills the
postmaster as well as pg_regress, but if we used pg_ctl, it might not,
because pg_regress uses fork()+exec(), but pg_ctl uses system() to
launch a shell which is in turn instructed to background the
postmaster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-02-17 Thread Alvaro Herrera
MauMau escribió:

 pg_ctl timed out waiting for the zombie postgres.
 
 maumau 19621 18849  0 15:21 pts/900:00:00 [postgres] defunct
 maumau 20253 18849  0 15:22 pts/900:00:00 
 /maumau/postgresql-9.4/src/test/regress/./tmp_check/install//maumau/pgsql/bin/pg_ctl
 stop -D /maumau/postgresql-9.4/src/test/regress/./tmp_check/data -s
 -m fast
 
 pg_regress must wait for postgres to terminate by calling waitpid(),
 because it invoked postgres directly.  The attached
 pg_regress_pg_stop.patch does this.  If you like the combination of
 this and the original fix for pg_ctl in one patch, please use
 pg_stop_fail_v3.patch.

The pg_regress part is ugly.  However, pg_regress is doing something
unusual when starting postmaster itself, so the ugly coding to stop it
seems to match.  If we wanted to avoid the ugliness here, the right fix
would be to use pg_ctl to start postmaster as well as to stop it.  But
that'd come at a price, because we would need more ugly code to figure
out postmaster's PID.  All in all, the compromise proposed by this patch
seems acceptable.  If we really wanted to make all this real pretty, we
could provide a libpg_ctl library to start and stop postmaster, as
well as query the PID.  Probably not worth the trouble.

I would apply this patch to all supported branches after this week's
release.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-02-17 Thread Michael Paquier
On Tue, Feb 18, 2014 at 1:29 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 MauMau escribió:
 The pg_regress part is ugly.  However, pg_regress is doing something
 unusual when starting postmaster itself, so the ugly coding to stop it
 seems to match.  If we wanted to avoid the ugliness here, the right fix
 would be to use pg_ctl to start postmaster as well as to stop it.  But
 that'd come at a price, because we would need more ugly code to figure
 out postmaster's PID.  All in all, the compromise proposed by this patch
 seems acceptable.  If we really wanted to make all this real pretty, we
 could provide a libpg_ctl library to start and stop postmaster, as
 well as query the PID.  Probably not worth the trouble.
This might not be worth the trouble for this bug, but actually it
could be useful to many third-part tools and extensions to have a
common and generic way to do things. I have seen many utilities using
a copy/paste of pg_ctl functions and still maintain some of them...
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-30 Thread MauMau

From: Ronan Dunklau ronan.dunk...@dalibo.com
There is no regression tests covering this bugfix, althought I don't know 
if

it would be practical to implement them.


Thanks for reviewing the patch.  I'm glad to know that it seems OK. 
Unfortunately, the current regression test system cannot handle the testing 
of pg_ctl.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-27 Thread Ronan Dunklau
Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit :
 On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote:
  Could you confirm again and tell me what problem is happening?
 
 FWIW, I just quickly tested those two patches independently and got
 them correctly applied with patch -p1  $PATCH on master at edc4345.
 They compiled and passed as well make check.
 Regards,

Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch.

Tests are running correctly.

The bug this patch is supposed to fix has been reproduced on current HEAD, and 
cannot be reproduced using the patch.

Previous concerns about using both get_pgpid and postmaster_is_alive are 
adressed.

There is no regression tests covering this bugfix, althought I don't know if 
it would be practical to implement them.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-07 Thread Michael Paquier
On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote:
 Could you confirm again and tell me what problem is happening?
FWIW, I just quickly tested those two patches independently and got
them correctly applied with patch -p1  $PATCH on master at edc4345.
They compiled and passed as well make check.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-04 Thread MauMau

From: Peter Eisentraut pete...@gmx.net

On 12/25/13, 6:40 AM, MauMau wrote:

pg_regress must wait for postgres to terminate by calling waitpid(),
because it invoked postgres directly.  The attached
pg_regress_pg_stop.patch does this.  If you like the combination of this
and the original fix for pg_ctl in one patch, please use
pg_stop_fail_v3.patch.


This patch doesn't apply.


Is that true?  Today, I downloaded the following file (whose timestamp was 
2014-Jan-04), and could apply pg_stop_fail_v3.patch cleanly.  The make 
check succeeded.


http://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz

Could you confirm again and tell me what problem is happening?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-03 Thread Peter Eisentraut
On 12/25/13, 6:40 AM, MauMau wrote:
 pg_regress must wait for postgres to terminate by calling waitpid(),
 because it invoked postgres directly.  The attached
 pg_regress_pg_stop.patch does this.  If you like the combination of this
 and the original fix for pg_ctl in one patch, please use
 pg_stop_fail_v3.patch.

This patch doesn't apply.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-25 Thread MauMau

From: Peter Eisentraut pete...@gmx.net

This patch breaks the regression tests:

xml  ... ok
test stats... ok
== shutting down postmaster   ==
waits a long time
pg_ctl: server does not shut down

pg_regress: could not stop postmaster: exit code was 256


pg_ctl timed out waiting for the zombie postgres.

maumau 19621 18849  0 15:21 pts/900:00:00 [postgres] defunct
maumau 20253 18849  0 15:22 pts/900:00:00 
/maumau/postgresql-9.4/src/test/regress/./tmp_check/install//maumau/pgsql/bin/pg_ctl 
stop -D /maumau/postgresql-9.4/src/test/regress/./tmp_check/data -s -m fast


pg_regress must wait for postgres to terminate by calling waitpid(), because 
it invoked postgres directly.  The attached pg_regress_pg_stop.patch does 
this.  If you like the combination of this and the original fix for pg_ctl 
in one patch, please use pg_stop_fail_v3.patch.


Sorry for causing trouble.


Regards
MauMau


pg_stop_fail_v3.patch
Description: Binary data


pg_regress_pg_stop.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-24 Thread Peter Eisentraut
On 12/5/13, 7:07 AM, MauMau wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 If you're going to do a postmaster_is_alive check, why bother with
 repeated get_pgpid()?
 
 As I said yesterday, I removed get_pgpid() calls.  I'll add this patch
 to 2014-1 commitfest this weekend if it is not committed until then.

This patch breaks the regression tests:

 xml  ... ok
test stats... ok
== shutting down postmaster   ==
waits a long time
pg_ctl: server does not shut down

pg_regress: could not stop postmaster: exit code was 256



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-05 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

If you're going to do a postmaster_is_alive check, why bother with
repeated get_pgpid()?


As I said yesterday, I removed get_pgpid() calls.  I'll add this patch to 
2014-1 commitfest this weekend if it is not committed until then.


Regards
MauMau


pg_stop_fail_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-04 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

I think the reason why it was coded like that was that we hadn't written
postmaster_is_alive() yet, or maybe we had but didn't want to trust it.
However, with the coding you have here, we're fully exposed to any failure
modes postmaster_is_alive() may have; so there's not a lot of value in
accepting those and get_pgpid's failure modes too.


Thank you for reviewing the patch so quickly.  You are right, I don't think 
get_pgpid() here is no longer necessary.  If the pid changes in one second, 
i.e. the original postgres terminates and pg_ctl start starts another one, 
pg_ctl stop can terminate successfully because the original postgres it 
was waiting for actually terminated.


I'll submit the revised patch tomorrow.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-03 Thread MauMau

Hello,

I've encountered a small bug and fixed it.  I guess this occurs on all major 
releases.  I saw this happen on 9.2 and 9.4devel.  Please find attached the 
patch and commit this.



[Problem]
If I mistakenly set an invalid value to listen_addresses, say '-1', and 
start the database server, it fails to start as follows.  In my environment 
(RHEL6 for Intel64), it takes about 15 seconds before postgres prints the 
messages.  This is OK.


[maumau@myhost pgdata]$ pg_ctl -w start
waiting for server to startLOG:  could not translate 
host name -1, service 5450 to address: Temporary failure in name 
resolution

WARNING:  could not create listen socket for -1
FATAL:  could not create any TCP/IP sockets
stopped waiting
pg_ctl: could not start server
Examine the log output.
[maumau@myhost pgdata]$

When I start the server without -w and try to stop it, pg_ctl stop waits 
for 60 seconds and timed out before it fails.  This is what I'm seeing as a 
problem.  I expected pg_ctl stop to respond quickly with success or 
failure depending on the timing.


[maumau@myhost pgdata]$ pg_ctl start
server starting
...(a few seconds later)
[maumau@myhost ~]$ pg_ctl stop
waiting for server to shut 
down.

.. failed
pg_ctl: server does not shut down
HINT: The -m fast option immediately disconnects sessions rather than
waiting for session-initiated disconnection.
[maumau@myhost ~]$


[Cause]
The problem occurs in the sequence below:

1. postmaster creates $PGDATA/postmaster.pid.
2. postmaster tries to resolve the value of listen_addresses to IP 
addresses.  This took about 15 seconds in my failure scenario.

3. During 2, pg_ctl sends SIGTERM to postmaster.
4. postmaster terminates immediately without deleting 
$PGDATA/postmaster.pid.  This is because it hasn't set signal handlers yet.
5. pg_ctl stop waits in a loop until $PGDATA/postmaster.pid disappears. 
But the file does not disappear and it times out.



[Fix]
Make pg_ctl check if postmaster is still alive, because postmaster might 
have crashed unexpectedly.



Regards
MauMau


pg_stop_fail.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2013-12-03 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 The problem occurs in the sequence below:

 1. postmaster creates $PGDATA/postmaster.pid.
 2. postmaster tries to resolve the value of listen_addresses to IP 
 addresses.  This took about 15 seconds in my failure scenario.
 3. During 2, pg_ctl sends SIGTERM to postmaster.
 4. postmaster terminates immediately without deleting 
 $PGDATA/postmaster.pid.  This is because it hasn't set signal handlers yet.
 5. pg_ctl stop waits in a loop until $PGDATA/postmaster.pid disappears. 
 But the file does not disappear and it times out.

Hm.  I wonder if we shouldn't block SIGTERM etc. earlier.  It hardly seems
improbable that such signals would arrive during a slow startup.

 *** 907,913 
   
   for (cnt = 0; cnt  wait_seconds; cnt++)
   {
 ! if ((pid = get_pgpid()) != 0)
   {
   print_msg(.);
   pg_usleep(100); /* 1 sec */
 --- 907,914 
   
   for (cnt = 0; cnt  wait_seconds; cnt++)
   {
 ! if ((pid = get_pgpid()) != 0 
 ! postmaster_is_alive((pid_t) pid))
   {
   print_msg(.);
   pg_usleep(100); /* 1 sec */

If you're going to do a postmaster_is_alive check, why bother with
repeated get_pgpid()?

I think the reason why it was coded like that was that we hadn't written
postmaster_is_alive() yet, or maybe we had but didn't want to trust it.
However, with the coding you have here, we're fully exposed to any failure
modes postmaster_is_alive() may have; so there's not a lot of value in
accepting those and get_pgpid's failure modes too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers