Re: [HACKERS] Dangling Client Backend Process

2015-11-12 Thread Robert Haas
On Wed, Nov 11, 2015 at 1:55 AM, Michael Paquier
 wrote:
> On Wed, Nov 4, 2015 at 2:18 AM, Robert Haas wrote:
>> The second conclusion does not appear to be correct.  parseInput()
>> will call pqParseInput3() or pqParseInput2(), either of which will
>> handle an error as if it were a notice - i.e. by printing it out.
>
> Right per pqGetErrorNotice3 when the connection is in PGASYNC_IDLE state.
>
>> Here's a patch based on that analysis, addressing just that one
>> function, not any of the other changes talked about on this thread.
>> Does this make sense?  Would we want to back-patch it, and if so how
>> far, or just adjust master?  My gut is just master, but I don't know
>> why this issue wouldn't also affect Hot Standby kills and maybe other
>> kinds of connection termination situations, so maybe there's an
>> argument for back-patching.  On the third hand, failing to read the
>> error message off of a just-terminated connection isn't exactly a
>> crisis of the first order either.
>
> Looks sane to me. As the connection is in PGASYNC idle state when
> crossing the path of pqHandleSendFailure() we would finish eating up
> all the error messages received from server and print an internal
> notice for the rest with "message type blah received from server while
> idle. Based on the lack of complaints regarding libpq on this side, I
> would just go for master, as for 9.5 is pretty late in this game to
> put some dust on it before a potential backpatch.

OK, committed just to master then, plus the change which is the
purpose of this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-11-10 Thread Michael Paquier
On Wed, Nov 4, 2015 at 2:18 AM, Robert Haas wrote:
>
> The second conclusion does not appear to be correct.  parseInput()
> will call pqParseInput3() or pqParseInput2(), either of which will
> handle an error as if it were a notice - i.e. by printing it out.

Right per pqGetErrorNotice3 when the connection is in PGASYNC_IDLE state.

> Here's a patch based on that analysis, addressing just that one
> function, not any of the other changes talked about on this thread.
> Does this make sense?  Would we want to back-patch it, and if so how
> far, or just adjust master?  My gut is just master, but I don't know
> why this issue wouldn't also affect Hot Standby kills and maybe other
> kinds of connection termination situations, so maybe there's an
> argument for back-patching.  On the third hand, failing to read the
> error message off of a just-terminated connection isn't exactly a
> crisis of the first order either.

Looks sane to me. As the connection is in PGASYNC idle state when
crossing the path of pqHandleSendFailure() we would finish eating up
all the error messages received from server and print an internal
notice for the rest with "message type blah received from server while
idle. Based on the lack of complaints regarding libpq on this side, I
would just go for master, as for 9.5 is pretty late in this game to
put some dust on it before a potential backpatch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-11-03 Thread Robert Haas
On Fri, Oct 30, 2015 at 11:03 AM, Andres Freund  wrote:
> On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > adding a parseInput(conn) into the loop yields the expected
>> > FATAL:  57P01: terminating connection due to unexpected postmaster exit
>> > Is there really any reason not to do that?
>>
>> Might work, but it probably needs some study:
>
> Yea, definitely. I was just at pgconf.eu's keynote catching up on a
> talk. No fully thought through proposal's to be expected ;)
>
>> (a) is it safe
>
> I don't immediately see why not.
>
>> (b) is this the right place / are there other places
>
> I think it's ok for the send failure case, in a quick lookthrough I
> didn't find anything else for writes - I'm not entirely sure all read
> cases are handled tho, but it seems less likely to be mishandles.

pqHandleSendFailure() has this comment:

 * Primarily, what we want to accomplish here is to process an async
 * NOTICE message that the backend might have sent just before it died.

And also this comment:

 * Accept any available input data, ignoring errors.  Note that if
 * pqReadData decides the backend has closed the channel, it will close
 * our side of the socket --- that's just what we want here.

And finally this comment:

 * Parse any available input messages.  Since we are in PGASYNC_IDLE
 * state, only NOTICE and NOTIFY messages will be eaten.

Now, taken together, these messages suggest two conclusions:

1. The decision to ignore errors here was deliberate.
2. Calling pqParseInput() wouldn't notice errors anyway because the
connection is PGASYNC_IDLE.

With respect to the first conclusion, I think it's fair to argue that,
while this may have seemed like a reasonable idea at the time, it's
probably not what we want any more.  Quite apart from the patch
proposed here, ProcessInterrupts() has assume for years (certainly
since 9.0, I think) that it's legitimate to signal a FATAL error to an
idle client and assume that the client will read that error as a
response to its next protocol message.  So it seems to me that this
function should be adjusted to notice errors, and not just notice and
notify messages.

The second conclusion does not appear to be correct.  parseInput()
will call pqParseInput3() or pqParseInput2(), either of which will
handle an error as if it were a notice - i.e. by printing it out.

Here's a patch based on that analysis, addressing just that one
function, not any of the other changes talked about on this thread.
Does this make sense?  Would we want to back-patch it, and if so how
far, or just adjust master?  My gut is just master, but I don't know
why this issue wouldn't also affect Hot Standby kills and maybe other
kinds of connection termination situations, so maybe there's an
argument for back-patching.  On the third hand, failing to read the
error message off of a just-terminated connection isn't exactly a
crisis of the first order either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 828f18e..07c4335 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1553,8 +1553,8 @@ sendFailed:
 /*
  * pqHandleSendFailure: try to clean up after failure to send command.
  *
- * Primarily, what we want to accomplish here is to process an async
- * NOTICE message that the backend might have sent just before it died.
+ * Primarily, what we want to accomplish here is to process any messages that
+ * the backend might have sent just before it died.
  *
  * NOTE: this routine should only be called in PGASYNC_IDLE state.
  */
@@ -1562,16 +1562,16 @@ void
 pqHandleSendFailure(PGconn *conn)
 {
 	/*
-	 * Accept any available input data, ignoring errors.  Note that if
-	 * pqReadData decides the backend has closed the channel, it will close
-	 * our side of the socket --- that's just what we want here.
+	 * Accept and parse any available input data.  Note that if pqReadData
+	 * decides the backend has closed the channel, it will close our side of
+	 * the socket --- that's just what we want here.
 	 */
 	while (pqReadData(conn) > 0)
-		 /* loop until no more data readable */ ;
+		parseInput(conn);
 
 	/*
-	 * Parse any available input messages.  Since we are in PGASYNC_IDLE
-	 * state, only NOTICE and NOTIFY messages will be eaten.
+	 * Make one attempt to parse available input messages even if we read no
+	 * data.
 	 */
 	parseInput(conn);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-11-02 Thread Rajeev rastogi
On 30 October 2015 20:33, Andres Freund Wrote:
>On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > adding a parseInput(conn) into the loop yields the expected
>> > FATAL:  57P01: terminating connection due to unexpected postmaster
>> > exit Is there really any reason not to do that?
>>
>> Might work, but it probably needs some study:
>
>Yea, definitely. I was just at pgconf.eu's keynote catching up on a talk.
>No fully thought through proposal's to be expected ;)
>
>> (a) is it safe
>
>I don't immediately see why not.
>
>> (b) is this the right place / are there other places
>
>I think it's ok for the send failure case, in a quick lookthrough I
>didn't find anything else for writes - I'm not entirely sure all read
>cases are handled tho, but it seems less likely to be mishandles.


I also did some analysis as Andres suggested and observed that since the error 
message is already sent by backend to frontend, 
so error message is available on connection channel just that is was not 
received/processed by frontend.
Also pqHandleSendFailure was checking the available data if any but it was 
ignored to process.

So as Andres suggested above, if we add parseInput to receiver and parse the 
available message on connection channel, we could display appropriate error. 

IMHO above proposed place to add parseInput seems to be right, as already this 
function takes of handling " NOTICE message that the backend might have sent 
just before it died "

Attached is the patch with this change.

Comments?

Thanks and Regards,
Kumar Rajeev Rastogi


dangle-v4.patch
Description: dangle-v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-30 Thread Robert Haas
On Tue, Oct 27, 2015 at 6:29 AM, Rajeev rastogi
 wrote:
> On 23 October 2015 01:58, Robert Haas [mailto:robertmh...@gmail.com] Wrote:
>>Well, I'm not buying this extra PostmasterIsAlive() call on every pass
>>through the main loop.  That seems more expensive than we can really
>>justify. Checking this when we're already calling WaitLatchOrSocket is
>>basically free, but the other part is not.
>
> Agree.
>
>>Here's a version with that removed and some changes to the comments.
>
> Thanks for changing.
>
>>I still don't think this is quite working right, though, because here's
>>what happened when I killed the postmaster:
>>
>>rhaas=# select 1;
>> ?column?
>>--
>>1
>>(1 row)
>>
>>rhaas=# \watch
>>Watch every 2sThu Oct 22 16:24:10 2015
>>
>> ?column?
>>--
>>1
>>(1 row)
>>
>>Watch every 2sThu Oct 22 16:24:12 2015
>>
>> ?column?
>>--
>>1
>>(1 row)
>>
>>Watch every 2sThu Oct 22 16:24:14 2015
>>
>> ?column?
>>--
>>1
>>(1 row)
>>
>>Watch every 2sThu Oct 22 16:24:16 2015
>>
>> ?column?
>>--
>>1
>>(1 row)
>>
>>server closed the connection unexpectedly
>>This probably means the server terminated abnormally
>>before or while processing the request.
>>The connection to the server was lost. Attempting reset: Failed.
>>
>>Note that the error message doesn't actually show up on the client (it
>>did show up in the log).  I guess that may be inevitable if we're
>>blocked in secure_write(), but if we're in secure_read() maybe it should
>>work?  I haven't investigated why it doesn't.
>
> Actually in this case client is not waiting for the response from the server 
> rather it is waiting for new command from user.
> So even though server has sent the response to client, client is not able to 
> receive.
> Once client receives the next command to execute, by the time connection has 
> terminated from server side and hence  it comes out with the above message 
> (i.e. server closed the connection...)
>
> Though I am also in favor of  providing some error message to client. But 
> with the current infrastructure, I don’t think there is any way to pass this 
> error message to client [This error has happened without involvement of the 
> client side].
>
> Comments?

Hmm.  ProcessInterrupts() signals some FATAL errors while the
connection is idle, and rumor has it that that works: the client
doesn't immediately read the FATAL error, but the next time it sends a
query, it tries to read from the connection and sees the FATAL error
at that time.  I wonder why that's not working here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-30 Thread Tom Lane
Robert Haas  writes:
> Hmm.  ProcessInterrupts() signals some FATAL errors while the
> connection is idle, and rumor has it that that works: the client
> doesn't immediately read the FATAL error, but the next time it sends a
> query, it tries to read from the connection and sees the FATAL error
> at that time.  I wonder why that's not working here.

A likely theory is that the kernel is reporting failure to libpq's
send() because the other side of the connection is already gone.
This would be timing-dependent of course.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-30 Thread Tom Lane
Andres Freund  writes:
> adding a parseInput(conn) into the loop yields the expected
> FATAL:  57P01: terminating connection due to unexpected postmaster exit
> Is there really any reason not to do that?

Might work, but it probably needs some study:
(a) is it safe
(b) is this the right place / are there other places

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-30 Thread Andres Freund
On 2015-10-30 09:48:33 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > Hmm.  ProcessInterrupts() signals some FATAL errors while the
> > connection is idle, and rumor has it that that works: the client
> > doesn't immediately read the FATAL error, but the next time it sends a
> > query, it tries to read from the connection and sees the FATAL error
> > at that time.  I wonder why that's not working here.
>
> A likely theory is that the kernel is reporting failure to libpq's
> send() because the other side of the connection is already gone.
> This would be timing-dependent of course.

Looking at a strace psql over unix socket is actually receiving the
error message:
recvfrom(3, "E\0\0\0lSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 
109
but psql does print:
server closed the connection unexpectedly

it happens to work over localhost:
FATAL:  57P01: terminating connection due to unexpected postmaster exit
LOCATION:  secure_read, be-secure.c:170
server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.

the problem seems to be the loop eating all the remaining input:
void
pqHandleSendFailure(PGconn *conn)
{
/*
 * Accept any available input data, ignoring errors.  Note that if
 * pqReadData decides the backend has closed the channel, it will close
 * our side of the socket --- that's just what we want here.
 */
while (pqReadData(conn) > 0)
 /* loop until no more data readable */ ;

after the first pqReadData() there's no remaining input and thus the
second call to pqReadData()'s pqsecure_read reads 0 and this is hit:
/*
 * OK, we are getting a zero read even though select() says ready. This
 * means the connection has been closed.  Cope.
 */
definitelyEOF:
printfPQExpBuffer(>errorMessage,
  libpq_gettext(
"server closed 
the connection unexpectedly\n"
   "\tThis probably means the server terminated 
abnormally\n"
 "\tbefore or while 
processing the request.\n"));

adding a parseInput(conn) into the loop yields the expected
FATAL:  57P01: terminating connection due to unexpected postmaster exit

Is there really any reason not to do that?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-30 Thread Andres Freund
On 2015-10-30 10:57:45 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > adding a parseInput(conn) into the loop yields the expected
> > FATAL:  57P01: terminating connection due to unexpected postmaster exit
> > Is there really any reason not to do that?
> 
> Might work, but it probably needs some study:

Yea, definitely. I was just at pgconf.eu's keynote catching up on a
talk. No fully thought through proposal's to be expected ;)

> (a) is it safe

I don't immediately see why not.

> (b) is this the right place / are there other places

I think it's ok for the send failure case, in a quick lookthrough I
didn't find anything else for writes - I'm not entirely sure all read
cases are handled tho, but it seems less likely to be mishandles.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-26 Thread Rajeev rastogi
On 23 October 2015 01:58, Robert Haas [mailto:robertmh...@gmail.com] Wrote:

>Well, I'm not buying this extra PostmasterIsAlive() call on every pass
>through the main loop.  That seems more expensive than we can really
>justify. Checking this when we're already calling WaitLatchOrSocket is
>basically free, but the other part is not.

Agree. 

>Here's a version with that removed and some changes to the comments.

Thanks for changing.

>I still don't think this is quite working right, though, because here's
>what happened when I killed the postmaster:
>
>rhaas=# select 1;
> ?column?
>--
>1
>(1 row)
>
>rhaas=# \watch
>Watch every 2sThu Oct 22 16:24:10 2015
>
> ?column?
>--
>1
>(1 row)
>
>Watch every 2sThu Oct 22 16:24:12 2015
>
> ?column?
>--
>1
>(1 row)
>
>Watch every 2sThu Oct 22 16:24:14 2015
>
> ?column?
>--
>1
>(1 row)
>
>Watch every 2sThu Oct 22 16:24:16 2015
>
> ?column?
>--
>1
>(1 row)
>
>server closed the connection unexpectedly
>This probably means the server terminated abnormally
>before or while processing the request.
>The connection to the server was lost. Attempting reset: Failed.
>
>Note that the error message doesn't actually show up on the client (it
>did show up in the log).  I guess that may be inevitable if we're
>blocked in secure_write(), but if we're in secure_read() maybe it should
>work?  I haven't investigated why it doesn't.

Actually in this case client is not waiting for the response from the server 
rather it is waiting for new command from user. 
So even though server has sent the response to client, client is not able to 
receive.
Once client receives the next command to execute, by the time connection has 
terminated from server side and hence  it comes out with the above message 
(i.e. server closed the connection...)

Though I am also in favor of  providing some error message to client. But with 
the current infrastructure, I don’t think there is any way to pass this error 
message to client [This error has happened without involvement of the client 
side].

Comments?

Thanks and Regards,
Kumar Rajeev Rastogi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-22 Thread Robert Haas
On Tue, Oct 20, 2015 at 11:42 PM, Rajeev rastogi
 wrote:
> Agreed. Attached is the patch with changes.

Well, I'm not buying this extra PostmasterIsAlive() call on every pass
through the main loop.  That seems more expensive than we can really
justify. Checking this when we're already calling WaitLatchOrSocket is
basically free, but the other part is not.

Here's a version with that removed and some changes to the comments.
I still don't think this is quite working right, though, because
here's what happened when I killed the postmaster:

rhaas=# select 1;
 ?column?
--
1
(1 row)

rhaas=# \watch
Watch every 2sThu Oct 22 16:24:10 2015

 ?column?
--
1
(1 row)

Watch every 2sThu Oct 22 16:24:12 2015

 ?column?
--
1
(1 row)

Watch every 2sThu Oct 22 16:24:14 2015

 ?column?
--
1
(1 row)

Watch every 2sThu Oct 22 16:24:16 2015

 ?column?
--
1
(1 row)

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Note that the error message doesn't actually show up on the client (it
did show up in the log).  I guess that may be inevitable if we're
blocked in secure_write(), but if we're in secure_read() maybe it
should work?  I haven't investigated why it doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 26d8faa..089435d 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -36,7 +36,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 #include "storage/proc.h"
-
+#include "storage/ipc.h"
 
 char	   *ssl_cert_file;
 char	   *ssl_key_file;
@@ -144,9 +144,31 @@ retry:
 		Assert(waitfor);
 
 		w = WaitLatchOrSocket(MyLatch,
-			  WL_LATCH_SET | waitfor,
+			  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
 			  port->sock, 0);
 
+		/*
+		 * If the postmaster has died, it's not safe to continue running,
+		 * because it is the postmaster's job to kill us if some other backend
+		 * exists uncleanly.  Moreover, we won't run very well in this state;
+		 * helper processes like walwriter and the bgwriter will exit, so
+		 * performance may be poor.  Finally, if we don't exit, pg_ctl will
+		 * be unable to restart the postmaster without manual intervention,
+		 * so no new connections can be accepted.  Exiting clears the deck
+		 * for a postmaster restart.
+		 *
+		 * (Note that we only make this check when we would otherwise sleep
+		 * on our latch.  We might still continue running for a while if the
+		 * postmaster is killed in mid-query, or even through multiple queries
+		 * if we never have to wait for read.  We don't want to burn too many
+		 * cycles checking for this very rare condition, and this shouuld cause
+		 * us to exit quickly in most cases.)
+		 */
+		if (w & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+	(errcode(ERRCODE_ADMIN_SHUTDOWN),
+	errmsg("terminating connection due to unexpected postmaster exit")));
+
 		/* Handle interrupt. */
 		if (w & WL_LATCH_SET)
 		{
@@ -223,9 +245,15 @@ retry:
 		Assert(waitfor);
 
 		w = WaitLatchOrSocket(MyLatch,
-			  WL_LATCH_SET | waitfor,
+			  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
 			  port->sock, 0);
 
+		/* See comments in secure_read. */
+		if (w & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+	(errcode(ERRCODE_ADMIN_SHUTDOWN),
+	errmsg("terminating connection due to unexpected postmaster exit")));
+
 		/* Handle interrupt. */
 		if (w & WL_LATCH_SET)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-20 Thread Robert Haas
On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> I don't think that proc_exit(1) is the right way to exit here.  It's
>> not very friendly to exit without at least attempting to give the
>> client a clue about what has gone wrong.  I suggest something like
>> this:
>>
>> ereport(FATAL,
>> (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>  errmsg("terminating connection due to postmaster shutdown")));
>
> Agreed, but I don't think "shutdown" is the right word to use here --
> that makes it sound like it was orderly.  Perhaps "crash"?

Well, that's a little speculative.  "due to unexpected postmaster exit"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-20 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
>  wrote:

> > Agreed, but I don't think "shutdown" is the right word to use here --
> > that makes it sound like it was orderly.  Perhaps "crash"?
> 
> Well, that's a little speculative.  "due to unexpected postmaster exit"?

WFM.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-20 Thread Alvaro Herrera
Robert Haas wrote:

> I don't think that proc_exit(1) is the right way to exit here.  It's
> not very friendly to exit without at least attempting to give the
> client a clue about what has gone wrong.  I suggest something like
> this:
> 
> ereport(FATAL,
> (errcode(ERRCODE_ADMIN_SHUTDOWN),
>  errmsg("terminating connection due to postmaster shutdown")));

Agreed, but I don't think "shutdown" is the right word to use here --
that makes it sound like it was orderly.  Perhaps "crash"?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-20 Thread Robert Haas
On Tue, Oct 20, 2015 at 12:48 AM, Rajeev rastogi
 wrote:
> On  19 October 2015 21:37, Robert Haas [mailto:robertmh...@gmail.com] Wrote:
>
>>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
>> wrote:
>>> Andres Freund wrote:
 On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
 > If I recall correctly, he concerned about killing the backends
 > running transactions which could be saved. I have a sympathy with
 > the opinion.

 I still don't. Leaving backends alive after postmaster has died
 prevents the auto-restart mechanism to from working from there on.
 Which means that we'll potentially continue happily after another
 backend has PANICed and potentially corrupted shared memory. Which
 isn't all that unlikely if postmaster isn't around anymore.
>>>
>>> I agree.  When postmaster terminates without waiting for all backends
>>> to go away, things are going horribly wrong -- either a DBA has done
>>> something stupid, or the system is misbehaving.  As Andres says, if
>>> another backend dies at that point, things are even worse -- the dying
>>> backend could have been holding a critical lwlock, for instance, or it
>>> could have corrupted shared buffers on its way out.  It is not
>>> sensible to leave the rest of the backends in the system still trying
>>> to run just because there is no one there to kill them.
>>
>>Yep.  +1 for changing this.
>
> Seems many people are in favor of this change.
> I have made changes to handle backend exit on postmaster death (after they 
> finished their work and waiting for new command).
> Changes are as per approach explained in my earlier mail i.e.
> 1. WaitLatchOrSocket called from secure_read and secure_write function will 
> wait on an additional event as WL_POSTMASTER_DEATH.
> 2. There is a possibility that the command is read without waiting on latch. 
> This case is handled by checking postmaster status after command read (i.e. 
> after ReadCommand).
>
> Attached is the patch.

I don't think that proc_exit(1) is the right way to exit here.  It's
not very friendly to exit without at least attempting to give the
client a clue about what has gone wrong.  I suggest something like
this:

ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
 errmsg("terminating connection due to postmaster shutdown")));

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-20 Thread Rajeev rastogi
On 20 October 2015 23:34, Robert Haas [mailto:robertmh...@gmail.com] Wrote:

>On Tue, Oct 20, 2015 at 12:11 PM, Alvaro Herrera
> wrote:
>> Robert Haas wrote:
>>> I don't think that proc_exit(1) is the right way to exit here.  It's
>>> not very friendly to exit without at least attempting to give the
>>> client a clue about what has gone wrong.  I suggest something like
>>> this:
>>>
>>> ereport(FATAL,
>>> (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>>  errmsg("terminating connection due to postmaster
>>> shutdown")));
>>
>> Agreed, but I don't think "shutdown" is the right word to use here --
>> that makes it sound like it was orderly.  Perhaps "crash"?
>
>Well, that's a little speculative.  "due to unexpected postmaster exit"?

Agreed. Attached is the patch with changes.

Thanks and Regards,
Kumar Rajeev Rastogi




dangling_backend_process_v2.patch
Description: dangling_backend_process_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-19 Thread Robert Haas
On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
 wrote:
> Andres Freund wrote:
>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>> > If I recall correctly, he concerned about killing the backends
>> > running transactions which could be saved. I have a sympathy with
>> > the opinion.
>>
>> I still don't. Leaving backends alive after postmaster has died prevents
>> the auto-restart mechanism to from working from there on.  Which means
>> that we'll potentially continue happily after another backend has
>> PANICed and potentially corrupted shared memory. Which isn't all that
>> unlikely if postmaster isn't around anymore.
>
> I agree.  When postmaster terminates without waiting for all backends to
> go away, things are going horribly wrong -- either a DBA has done
> something stupid, or the system is misbehaving.  As Andres says, if
> another backend dies at that point, things are even worse -- the dying
> backend could have been holding a critical lwlock, for instance, or it
> could have corrupted shared buffers on its way out.  It is not sensible
> to leave the rest of the backends in the system still trying to run just
> because there is no one there to kill them.

Yep.  +1 for changing this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-19 Thread Rajeev rastogi
On  19 October 2015 21:37, Robert Haas [mailto:robertmh...@gmail.com] Wrote:

>On Sat, Oct 17, 2015 at 4:52 PM, Alvaro Herrera
> wrote:
>> Andres Freund wrote:
>>> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
>>> > If I recall correctly, he concerned about killing the backends
>>> > running transactions which could be saved. I have a sympathy with
>>> > the opinion.
>>>
>>> I still don't. Leaving backends alive after postmaster has died
>>> prevents the auto-restart mechanism to from working from there on.
>>> Which means that we'll potentially continue happily after another
>>> backend has PANICed and potentially corrupted shared memory. Which
>>> isn't all that unlikely if postmaster isn't around anymore.
>>
>> I agree.  When postmaster terminates without waiting for all backends
>> to go away, things are going horribly wrong -- either a DBA has done
>> something stupid, or the system is misbehaving.  As Andres says, if
>> another backend dies at that point, things are even worse -- the dying
>> backend could have been holding a critical lwlock, for instance, or it
>> could have corrupted shared buffers on its way out.  It is not
>> sensible to leave the rest of the backends in the system still trying
>> to run just because there is no one there to kill them.
>
>Yep.  +1 for changing this.

Seems many people are in favor of this change.
I have made changes to handle backend exit on postmaster death (after they 
finished their work and waiting for new command). 
Changes are as per approach explained in my earlier mail i.e.
1. WaitLatchOrSocket called from secure_read and secure_write function will 
wait on an additional event as WL_POSTMASTER_DEATH.
2. There is a possibility that the command is read without waiting on latch. 
This case is handled by checking postmaster status after command read (i.e. 
after ReadCommand).

Attached is the patch.

Thanks and Regards,
Kumar Rajeev Rastogi





dangling_backend_process.patch
Description: dangling_backend_process.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-17 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
> > If I recall correctly, he concerned about killing the backends
> > running transactions which could be saved. I have a sympathy with
> > the opinion.
> 
> I still don't. Leaving backends alive after postmaster has died prevents
> the auto-restart mechanism to from working from there on.  Which means
> that we'll potentially continue happily after another backend has
> PANICed and potentially corrupted shared memory. Which isn't all that
> unlikely if postmaster isn't around anymore.

I agree.  When postmaster terminates without waiting for all backends to
go away, things are going horribly wrong -- either a DBA has done
something stupid, or the system is misbehaving.  As Andres says, if
another backend dies at that point, things are even worse -- the dying
backend could have been holding a critical lwlock, for instance, or it
could have corrupted shared buffers on its way out.  It is not sensible
to leave the rest of the backends in the system still trying to run just
because there is no one there to kill them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-16 Thread Rajeev rastogi
On 14 October 2015 14:03, Kyotaro HORIGUCHI:

>Subject: Re: [HACKERS] Dangling Client Backend Process
>
>At Wed, 14 Oct 2015 11:08:37 +0530, Amit Kapila <amit.kapil...@gmail.com>
>wrote in
><CAA4eK1L8SGWymhXF+yDpxiyA2ARCiEgQ88XsLhEvJba3Fh_F=q...@mail.gmail.com>
>> On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi
>> <rajeev.rast...@huawei.com>
>> wrote:
>> > If we add the event WL_POSTMASTER_DEATH also, client backend process
>> > handling will become same as other backend process. So postmaster
>> > death can be detected in the same way.
>> >
>> > But I am not sure if WL_POSTMASTER_DEATH event was not added
>> > intentionally for some reason. Please confirm.
>> >
>> > Also is it OK to add this even handling in generic path of Libpq?
>> >
>> > Please let me know if I am missing something?
>> >
>> >
>> I feel this is worth investigation, example for what kind of cases
>> libpq is used for non-blocking sockets, because for such cases above
>> idea will not work.
>
>Blocking mode of a port is changed using socket_set_nonblocking(). I
>found two points that the function is called with true.
>pq_getbyte_if_available() and socket_flush_if_writable(). They seems to
>be used only in walsender *so far*.

Yes and in that case I assume the proposed solution will work in all cases.

>> Here, I think the bigger point is that, Tom was not in favour of this
>> proposal (making backends exit on postmaster death ) at that time, not
>> sure whether he has changed his mind.

>If I recall correctly, he concerned about killing the backends running
>transactions which could be saved. I have a sympathy with the opinion.
>But also think it reasonable to kill all backends immediately so that
>new postmaster can run...

Yes it will be really helpful to know the earlier reason for "not making 
backend exit on postmaster death". 
Please let me know if there is any thread, which I can refer to find the same.

IMHO there could be below major issues, if we don't kill client backend process 
on postmaster death:
1. Postmaster cannot be re-started again as pointed by Kyotaro and Andres Also.
2. If from existing client session, we try to do some operation which has 
dependency with backend process, then that operation will either fail or may 
lead to undefined behavior sometime.
3. Also unintentionally all operation done by application will not be 
checkpointed and also will not be flushed by bgworker. 
4. Replicating of WAL to standby will be stopped and if synchronous standby is 
configured then command from master will be hanged.

Thanks and Regards,
Kumar Rajeev Rastogi





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-16 Thread Amit Kapila
On Fri, Oct 16, 2015 at 3:34 PM, Rajeev rastogi 
wrote:
>
>
> Yes it will be really helpful to know the earlier reason for "not making
> backend exit on postmaster death".
> Please let me know if there is any thread, which I can refer to find the
> same.
>
> IMHO there could be below major issues, if we don't kill client backend
> process on postmaster death:
> 1. Postmaster cannot be re-started again as pointed by Kyotaro and Andres
> Also.
> 2. If from existing client session, we try to do some operation which has
> dependency with backend process, then that operation will either fail or
> may lead to undefined behavior sometime.
> 3. Also unintentionally all operation done by application will not be
> checkpointed and also will not be flushed by bgworker.
> 4. Replicating of WAL to standby will be stopped and if synchronous
> standby is configured then command from master will be hanged.
>
>
What exactly we want to do as part of this proposal?  As far as I
can see, we have below two options on postmaster death:

a. Exit all the active backends in whichever state they are.
b. Exit backend/s after they finish there current work and are
waiting for new command.

I think what we are discussing here is option-b.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Dangling Client Backend Process

2015-10-14 Thread Kyotaro HORIGUCHI
At Wed, 14 Oct 2015 11:08:37 +0530, Amit Kapila  wrote 
in 
> On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi 
> wrote:
> > If we add the event WL_POSTMASTER_DEATH also, client backend process
> > handling will become same as other backend process. So postmaster death can
> > be detected in the same way.
> >
> > But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally
> > for some reason. Please confirm.
> > 
> > Also is it OK to add this even handling in generic path of Libpq?
> >
> > Please let me know if I am missing something?
> >
> >
> I feel this is worth investigation, example for what kind of cases libpq is
> used for non-blocking sockets, because for such cases above idea
> will not work.

Blocking mode of a port is changed using
socket_set_nonblocking(). I found two points that the function is
called with true.  pq_getbyte_if_available() and
socket_flush_if_writable(). They seems to be used only in
walsender *so far*.

> Here, I think the bigger point is that, Tom was not in favour of
> this proposal (making backends exit on postmaster death ) at that time,
> not sure whether he has changed his mind.

If I recall correctly, he concerned about killing the backends
running transactions which could be saved. I have a sympathy with
the opinion. But also think it reasonable to kill all backends
immediately so that new postmaster can run...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-14 Thread Andres Freund
On 2015-10-14 17:33:01 +0900, Kyotaro HORIGUCHI wrote:
> If I recall correctly, he concerned about killing the backends
> running transactions which could be saved. I have a sympathy with
> the opinion.

I still don't. Leaving backends alive after postmaster has died prevents
the auto-restart mechanism to from working from there on.  Which means
that we'll potentially continue happily after another backend has
PANICed and potentially corrupted shared memory. Which isn't all that
unlikely if postmaster isn't around anymore.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dangling Client Backend Process

2015-10-13 Thread Amit Kapila
On Tue, Oct 13, 2015 at 3:54 PM, Rajeev rastogi 
wrote:

> On 12th October 2015 20:45, Rajeev Rastogi Wrote:
>
>
>
> >>> I observed one strange behavior today that if postmaster process gets
> crashed/killed, then it kill all background processes but not the client
> backend process.
>
>
>
> >> This is a known behaviour and there was some discussion on this
>
> >> topic [1] previously as well.
>
>
>
> > Now as it is confirmed to be valid issue, I will spend some time on this
> to find if there is something more appropriate solution.
>
>
>
> I checked the latest code and found Heikki has already added code for
> secure_read using the latch mechanism (using WaitLatchOrSocket). It
> currently waits for two events i.e. WL_LATCH_SET and WL_SOCKET_READABLE.
>
> Commit id: 80788a431e9bff06314a054109fdea66ac538199
>
>
>
> If we add the event WL_POSTMASTER_DEATH also, client backend process
> handling will become same as other backend process. So postmaster death can
> be detected in the same way.
>
>
>
> But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally
> for some reason. Please confirm.
>
> Also is it OK to add this even handling in generic path of Libpq?
>
>
>
> Please let me know if I am missing something?
>
>
I feel this is worth investigation, example for what kind of cases libpq is
used for non-blocking sockets, because for such cases above idea
will not work.

Here, I think the bigger point is that, Tom was not in favour of
this proposal (making backends exit on postmaster death ) at that time,
not sure whether he has changed his mind.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Dangling Client Backend Process

2015-10-13 Thread Rajeev rastogi
On 12th October 2015 20:45, Rajeev Rastogi Wrote:

>>> I observed one strange behavior today that if postmaster process gets 
>>> crashed/killed, then it kill all background processes but not the client 
>>> backend process.

>> This is a known behaviour and there was some discussion on this
>> topic [1] previously as well.

> Now as it is confirmed to be valid issue, I will spend some time on this to 
> find if there is something more appropriate solution.

I checked the latest code and found Heikki has already added code for 
secure_read using the latch mechanism (using WaitLatchOrSocket). It currently 
waits for two events i.e. WL_LATCH_SET and WL_SOCKET_READABLE.
Commit id: 80788a431e9bff06314a054109fdea66ac538199

If we add the event WL_POSTMASTER_DEATH also, client backend process handling 
will become same as other backend process. So postmaster death can be detected 
in the same way.

But I am not sure if WL_POSTMASTER_DEATH event was not added intentionally for 
some reason. Please confirm.
Also is it OK to add this even handling in generic path of Libpq?

Please let me know if I am missing something?

Thanks and Regards,
Kumar Rajeev Rastogi


Re: [HACKERS] Dangling Client Backend Process

2015-10-12 Thread Rajeev rastogi
On 10 October 2015 20:45, Amit Kapila Wrote:

>> I observed one strange behavior today that if postmaster process gets 
>> crashed/killed, then it kill all background processes but not the client 
>> backend process.

> This is a known behaviour and there was some discussion on this
> topic [1] previously as well.  I think that thread didn't reach to conclusion,
> but there were couple of other reasons discussed in that thread as well to
> have the behaviour as you are proposing here.

Oops..I did not know about this. I shall check the older thread to get other 
opinions.

>> One way to handle this issue will be to check whether postmaster is alive 
>> after every command read but it will add extra cost for each query execution.

> I don't think that is a good idea as if there is no command execution
> it will still stay as it is and doing such operations on each command
> doesn't sound to be good idea even though overhead might not be
> big.  There are some other ideas discussed in that thread [2] to achieve
> this behaviour, but I think we need to find a portable way to achieve it.

Yes, you are right that process will not be closed till a new command comes but 
I think it does not harm functionality in anyway except that the process and 
its acquired resources
does not get freed. Also use-case of application will be very less where their 
client process stays idle for very long time.
But at the same time I agree this is not the best solution, we should look for 
more appropriate/better one.
Now as it is confirmed to be valid issue, I will spend some time on this to 
find if there is something more appropriate solution.

Thanks and Regards,
Kumar Rajeev Rastogi


[HACKERS] Dangling Client Backend Process

2015-10-10 Thread Rajeev rastogi
I observed one strange behavior today that if postmaster process gets 
crashed/killed, then it kill all background processes but not the client 
backend process.
Moreover it is also allowed to execute query on the connected client session 
without any other background process.
But If I try to execute some command (like checkpoint) from the client session 
which requires any background task to perform, it fails because it cannot find 
the corresponding background process (like checkpoint process).

I am not sure if this is already known behavior but I found it to be little 
awkward. This may lead to some unknown behavior in user application.

Currently All background process keeps checking if Postmaster is Alive while 
they wait for any event but for client backend process there is no such 
mechanism.

One way to handle this issue will be to check whether postmaster is alive after 
every command read but it will add extra cost for each query execution.

Any comments?

Thanks and Regards,
Kumar Rajeev Rastogi


Re: [HACKERS] Dangling Client Backend Process

2015-10-10 Thread Amit Kapila
On Sat, Oct 10, 2015 at 3:42 PM, Rajeev rastogi 
wrote:

> I observed one strange behavior today that if postmaster process gets
> crashed/killed, then it kill all background processes but not the client
> backend process.
>
> Moreover it is also allowed to execute query on the connected client
> session without any other background process.
>
> But If I try to execute some command (like checkpoint) from the client
> session which requires any background task to perform, it fails because it
> cannot find the corresponding background process (like checkpoint process).
>
>
>
> I am not sure if this is already known behavior but I found it to be
> little awkward. This may lead to some unknown behavior in user application.
>
>
>

This is a known behaviour and there was some discussion on this
topic [1] previously as well.  I think that thread didn't reach to
conclusion,
but there were couple of other reasons discussed in that thread as well to
have the behaviour as you are proposing here.



> Currently All background process keeps checking if Postmaster is Alive
> while they wait for any event but for client backend process there is no
> such mechanism.
>
>
>
> One way to handle this issue will be to check whether postmaster is alive
> after every command read but it will add extra cost for each query
> execution.
>
>
I don't think that is a good idea as if there is no command execution
it will still stay as it is and doing such operations on each command
doesn't sound to be good idea even though overhead might not be
big.  There are some other ideas discussed in that thread [2] to achieve
this behaviour, but I think we need to find a portable way to achieve it.


[1] - http://www.postgresql.org/message-id/26217.1371851...@sss.pgh.pa.us
[2] -
http://www.postgresql.org/message-id/20130622174922.gd1...@alap2.anarazel.de

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com