Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Robert Haas
On Wed, Aug 2, 2017 at 9:33 PM, Peter Eisentraut
 wrote:
> On 8/2/17 16:52, Robert Haas wrote:
>> I actually don't think it's that unreasonable to get notified when
>> system-wide processes like the autovacuum launcher or the logical
>> replication launcher start or stop.
>
> But we got rid of those start messages recently due to complaints.

Yeah, true.  I'm just talking about what *I* think.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Peter Eisentraut
On 8/2/17 16:52, Robert Haas wrote:
> I actually don't think it's that unreasonable to get notified when
> system-wide processes like the autovacuum launcher or the logical
> replication launcher start or stop.

But we got rid of those start messages recently due to complaints.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Andres Freund
On 2017-08-02 17:09:10 -0400, Tom Lane wrote:
> Even if we fix that, though, I think it is reasonable to downgrade it to
> DEBUG1.  We did that already for other standard background processes such
> as the autovac launcher, and it's not apparent to me why the logrep
> launcher should be chattier.  Now, *unexpected* starts or stops should
> certainly deserve a higher log rating ... but the general rule ought to be
> that totally-expected behavior does not deserve a log entry by default.

Well, every exit *other* than during a shutdown is unexpected.  That's
why I suggested changing the log level for exit code 1 depending on the
cluster being shut down or not.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Tom Lane
Andres Freund  writes:
> On 2017-08-02 16:52:01 -0400, Robert Haas wrote:
>> I actually don't think it's that unreasonable to get notified when
>> system-wide processes like the autovacuum launcher or the logical
>> replication launcher start or stop.  That's stuff somebody might want
>> to know.  It's not going to generate a lot of log volume, and it might
>> be useful, so why suppress it?

> I generally agree.  But in the shutdown case it's just useless and
> confusing - the launcher is stopping because the entire server is being
> stopped, and that's very much not clear from the message.

Yes.  The main complaint here is not about the existence of the message
but about the fact that it looks like it's reporting a failure.  I would
vote for removing it if it's not simple to fix that problem.

Even if we fix that, though, I think it is reasonable to downgrade it to
DEBUG1.  We did that already for other standard background processes such
as the autovac launcher, and it's not apparent to me why the logrep
launcher should be chattier.  Now, *unexpected* starts or stops should
certainly deserve a higher log rating ... but the general rule ought to be
that totally-expected behavior does not deserve a log entry by default.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Andres Freund
On 2017-08-02 16:52:01 -0400, Robert Haas wrote:
> On Wed, Aug 2, 2017 at 4:38 PM, Peter Eisentraut
>  wrote:
> > Maybe it doesn't need to be logged at all (other than perhaps as DEBUG)?
> >  A few months ago, people were complaining about too many messages about
> > background workers starting.  Now we are having complaints about
> > messages about background workers stopping.
> 
> I actually don't think it's that unreasonable to get notified when
> system-wide processes like the autovacuum launcher or the logical
> replication launcher start or stop.  That's stuff somebody might want
> to know.  It's not going to generate a lot of log volume, and it might
> be useful, so why suppress it?
> 
> Where things get ugly is if you start to get a high rate of messages -
> e.g. from starting and stopping parallel query workers or other kinds
> of things where you might have workers starting and stopping very
> frequently.  But surely this isn't an example of that.

I generally agree.  But in the shutdown case it's just useless and
confusing - the launcher is stopping because the entire server is being
stopped, and that's very much not clear from the message.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Robert Haas
On Wed, Aug 2, 2017 at 4:38 PM, Peter Eisentraut
 wrote:
> Maybe it doesn't need to be logged at all (other than perhaps as DEBUG)?
>  A few months ago, people were complaining about too many messages about
> background workers starting.  Now we are having complaints about
> messages about background workers stopping.

I actually don't think it's that unreasonable to get notified when
system-wide processes like the autovacuum launcher or the logical
replication launcher start or stop.  That's stuff somebody might want
to know.  It's not going to generate a lot of log volume, and it might
be useful, so why suppress it?

Where things get ugly is if you start to get a high rate of messages -
e.g. from starting and stopping parallel query workers or other kinds
of things where you might have workers starting and stopping very
frequently.  But surely this isn't an example of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-02 Thread Peter Eisentraut
On 8/1/17 20:20, Andres Freund wrote:
> Well, that's how it is for all bgworkers - maybe a better solution is to
> adjust that message in the postmaster rather than fiddle with the worker
> exist code?  Seems like we could easily take pmStatus into account
> inside LogChildExit() and set the log level to DEBUG1 even for
> EXIT_STATUS_1 in that case?  Additionally we probably should always log
> a better message for bgworkers exiting with exit 1, something about
> unregistering the worker or such.

Maybe it doesn't need to be logged at all (other than perhaps as DEBUG)?
 A few months ago, people were complaining about too many messages about
background workers starting.  Now we are having complaints about
messages about background workers stopping.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Andres Freund
On 2017-08-01 20:37:07 -0400, Robert Haas wrote:
> On Tue, Aug 1, 2017 at 7:03 PM, Andres Freund  wrote:
> > On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
> >> When I shut down a cluster that isn't using logical replication, it
> >> always logs a line like the following.  So do the build farm members I
> >> looked at.  I didn't see anything about this in the open items list --
> >> isn't it a bug?
> >>
> >> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
> >> replication launcher (PID 34788) exited with exit code 1
> >
> > Exit code 0 signals that a worker should be restarted. Therefore
> > graceful exit can't really use that.  I think a) we really need to
> > improve bgworker infrastructure around that b) shows the limit of using
> > bgworkers for this kinda thing - we should probably have a more bgworker
> > like infrastructure for internal workers.
> 
> You might've missed commit be7558162acc5578d0b2cf0c8d4c76b6076ce352.

Not really, just thinko-ing it. We don't want to unregister, so we can't
return 0, IOW, I just * -1'd my comment ;)

We intentionally return 1, so we *do* get restarted:
else if (IsLogicalLauncher())
{
ereport(DEBUG1,
(errmsg("logical replication launcher 
shutting down")));

/*
 * The logical replication launcher can be stopped at 
any time.
 * Use exit status 1 so the background worker is 
restarted.
 */
proc_exit(1);
}

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Robert Haas
On Tue, Aug 1, 2017 at 7:03 PM, Andres Freund  wrote:
> On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
>> When I shut down a cluster that isn't using logical replication, it
>> always logs a line like the following.  So do the build farm members I
>> looked at.  I didn't see anything about this in the open items list --
>> isn't it a bug?
>>
>> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
>> replication launcher (PID 34788) exited with exit code 1
>
> Exit code 0 signals that a worker should be restarted. Therefore
> graceful exit can't really use that.  I think a) we really need to
> improve bgworker infrastructure around that b) shows the limit of using
> bgworkers for this kinda thing - we should probably have a more bgworker
> like infrastructure for internal workers.

You might've missed commit be7558162acc5578d0b2cf0c8d4c76b6076ce352.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Andres Freund
On 2017-08-02 12:14:18 +1200, Thomas Munro wrote:
> On Wed, Aug 2, 2017 at 11:03 AM, Andres Freund  wrote:
> > On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
> >> When I shut down a cluster that isn't using logical replication, it
> >> always logs a line like the following.  So do the build farm members I
> >> looked at.  I didn't see anything about this in the open items list --
> >> isn't it a bug?
> >>
> >> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
> >> replication launcher (PID 34788) exited with exit code 1
> >
> > Exit code 0 signals that a worker should be restarted. Therefore
> > graceful exit can't really use that.  I think a) we really need to
> > improve bgworker infrastructure around that b) shows the limit of using
> > bgworkers for this kinda thing - we should probably have a more bgworker
> > like infrastructure for internal workers.
> 
> I see.  In the meantime IMHO I think we should try to find a way to
> avoid printing out this message -- it looks like something is wrong to
> the uninitiated.

Well, that's how it is for all bgworkers - maybe a better solution is to
adjust that message in the postmaster rather than fiddle with the worker
exist code?  Seems like we could easily take pmStatus into account
inside LogChildExit() and set the log level to DEBUG1 even for
EXIT_STATUS_1 in that case?  Additionally we probably should always log
a better message for bgworkers exiting with exit 1, something about
unregistering the worker or such.


> Possibly stupid question: why do we restart workers when we know we're
> shutting down anyway?  Hmm, I suppose there might conceivably be
> workers that need to do something during shutdown and they might not
> have done it yet.

The launcher doesn't really know the reason for the shutdown.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Thomas Munro
On Wed, Aug 2, 2017 at 11:03 AM, Andres Freund  wrote:
> On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
>> When I shut down a cluster that isn't using logical replication, it
>> always logs a line like the following.  So do the build farm members I
>> looked at.  I didn't see anything about this in the open items list --
>> isn't it a bug?
>>
>> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
>> replication launcher (PID 34788) exited with exit code 1
>
> Exit code 0 signals that a worker should be restarted. Therefore
> graceful exit can't really use that.  I think a) we really need to
> improve bgworker infrastructure around that b) shows the limit of using
> bgworkers for this kinda thing - we should probably have a more bgworker
> like infrastructure for internal workers.

I see.  In the meantime IMHO I think we should try to find a way to
avoid printing out this message -- it looks like something is wrong to
the uninitiated.

Possibly stupid question: why do we restart workers when we know we're
shutting down anyway?  Hmm, I suppose there might conceivably be
workers that need to do something during shutdown and they might not
have done it yet.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Andres Freund
On 2017-08-02 11:19:39 +1200, Gavin Flower wrote:
> Returning zero to indicate success is a holdover to the time computers could
> only run one program at a time.  At the end of the code there was a jump
> table of 4 byte entries.  The first entry with a displacement of zero was
> the location to jump to for a normal exit, subsequent entries where for
> various error conditions.  This why often return codes where in positive
> multiples of 4, since we don't use jump tables now - more & more people are
> using any integers they want.
> 
> So apart from convention, returning zero is no longer held to be a sacred to
> indicate something exited okay.  In fact since, zero could simply mean a
> value was not explicitly assigned, hence it is actually a very dangerous
> value  to be used to indicate success!

This has nothing to do with this thread.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Gavin Flower

On 02/08/17 11:03, Andres Freund wrote:

Hi,

On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:

When I shut down a cluster that isn't using logical replication, it
always logs a line like the following.  So do the build farm members I
looked at.  I didn't see anything about this in the open items list --
isn't it a bug?

2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
replication launcher (PID 34788) exited with exit code 1

Exit code 0 signals that a worker should be restarted. Therefore
graceful exit can't really use that.  I think a) we really need to
improve bgworker infrastructure around that b) shows the limit of using
bgworkers for this kinda thing - we should probably have a more bgworker
like infrastructure for internal workers.

- Andres


Returning zero to indicate success is a holdover to the time computers 
could only run one program at a time.  At the end of the code there was 
a jump table of 4 byte entries.  The first entry with a displacement of 
zero was the location to jump to for a normal exit, subsequent entries 
where for various error conditions.  This why often return codes where 
in positive multiples of 4, since we don't use jump tables now - more & 
more people are using any integers they want.


So apart from convention, returning zero is no longer held to be a 
sacred to indicate something exited okay.  In fact since, zero could 
simply mean a value was not explicitly assigned, hence it is actually a 
very dangerous value  to be used to indicate success!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Andres Freund
Hi,

On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:
> When I shut down a cluster that isn't using logical replication, it
> always logs a line like the following.  So do the build farm members I
> looked at.  I didn't see anything about this in the open items list --
> isn't it a bug?
> 
> 2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
> replication launcher (PID 34788) exited with exit code 1

Exit code 0 signals that a worker should be restarted. Therefore
graceful exit can't really use that.  I think a) we really need to
improve bgworker infrastructure around that b) shows the limit of using
bgworkers for this kinda thing - we should probably have a more bgworker
like infrastructure for internal workers.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers