Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Jeff Janes
On Thu, Jun 29, 2017 at 11:39 AM, Tom Lane wrote: > Jeff Janes writes: > > In the now-committed version of this, the 'pg_ctl start' returns > > successfully as soon as the server reaches a consistent state. Which is > OK, > > except that it does the same thing when hot_standby=off. When > > hot

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Tom Lane
Jeff Janes writes: > In the now-committed version of this, the 'pg_ctl start' returns > successfully as soon as the server reaches a consistent state. Which is OK, > except that it does the same thing when hot_standby=off. When > hot_standby=off, I would expect it to wait for the end of recovery

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Andres Freund
On June 29, 2017 10:19:46 AM PDT, Jeff Janes wrote: >On Tue, Jun 27, 2017 at 11:59 AM, Tom Lane wrote: > >> I wrote: >> > Andres Freund writes: >> >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >> >>> Hm. Take that a bit further, and we could drop the connection >probes >> >>> altogether --

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-29 Thread Jeff Janes
On Tue, Jun 27, 2017 at 11:59 AM, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote: > >>> Hm. Take that a bit further, and we could drop the connection probes > >>> altogether --- just put the whole responsibility on the postmaster to > >>>

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Ants Aasma
On Wed, Jun 28, 2017 at 8:31 PM, Tom Lane wrote: > Andres Freund writes: >> On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >>> However, it's certainly arguable that this is too much change for an >>> optional post-beta patch. > >> Yea, I think there's a valid case to be made for that. I'm still >>

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So when I removed the miscadmin.h include, I found out that pg_ctl is >> also relying on PG_BACKEND_VERSIONSTR from that file. >> >> There are at least three things we could do here: >> >> 1. Give this up as not worth this much trouble. >> >> 2. Move

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Alvaro Herrera
Tom Lane wrote: > Andres Freund writes: > So when I removed the miscadmin.h include, I found out that pg_ctl is > also relying on PG_BACKEND_VERSIONSTR from that file. > > There are at least three things we could do here: > > 1. Give this up as not worth this much trouble. > > 2. Move PG_BACKE

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Tom Lane
Andres Freund writes: > On 2017-06-28 13:31:27 -0400, Tom Lane wrote: >> While looking this over again, I got worried about the fact that pg_ctl >> is #including "miscadmin.h". That's a pretty low-level backend header >> and it wouldn't be surprising at all if somebody tried to put stuff in >> it

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Andres Freund
On 2017-06-28 13:31:27 -0400, Tom Lane wrote: > I'm not hearing anyone speaking against doing this now, so I'm going > to go ahead with it. Cool. > While looking this over again, I got worried about the fact that pg_ctl > is #including "miscadmin.h". That's a pretty low-level backend header > a

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-28 Thread Tom Lane
Andres Freund writes: > On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >> However, it's certainly arguable that this is too much change for an >> optional post-beta patch. > Yea, I think there's a valid case to be made for that. I'm still > inclined to go along with this, it seems we're otherwise

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-27 Thread Tom Lane
Andres Freund writes: > On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >> If we decide that it has to wait for v11, >> I'd address Jeff's complaint by hacking the loop behavior in >> test_postmaster_connection, which'd be ugly but not many lines of code. > Basically increasing the wait time over t

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-27 Thread Andres Freund
Hi, On 2017-06-27 14:59:18 -0400, Tom Lane wrote: > Here's a draft patch for that. I quite like the results --- this seems > way simpler and more reliable than what pg_ctl has done up to now. Yea, I like that too. > However, it's certainly arguable that this is too much change for an > optiona

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-27 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >>> Hm. Take that a bit further, and we could drop the connection probes >>> altogether --- just put the whole responsibility on the postmaster to >>> show in the pidfile whether it's ready for connections or not.

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Andres Freund writes: > On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >> Hm. Take that a bit further, and we could drop the connection probes >> altogether --- just put the whole responsibility on the postmaster to >> show in the pidfile whether it's ready for connections or not. > Yea, that see

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund
On 2017-06-26 17:38:03 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-06-26 17:30:30 -0400, Tom Lane wrote: > >> No, I don't like that at all. Has race conditions against updates > >> coming from the startup process. > > > You'd obviously have to take the appropriate locks. I think

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Andres Freund writes: > On 2017-06-26 17:30:30 -0400, Tom Lane wrote: >> No, I don't like that at all. Has race conditions against updates >> coming from the startup process. > You'd obviously have to take the appropriate locks. I think the issue > here is less race conditions, and more that ar

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund
On 2017-06-26 17:30:30 -0400, Tom Lane wrote: > Andres Freund writes: > > It'd be quite possible to address the race-condition by moving the > > updating of the control file to postmaster, to the > > CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require > > updating the control f

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Andres Freund writes: > It'd be quite possible to address the race-condition by moving the > updating of the control file to postmaster, to the > CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require > updating the control file from postmaster, which'd be somewhat ugly. No, I do

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund
Hi, On 2017-06-26 16:49:07 -0400, Tom Lane wrote: > Andres Freund writes: > > Arguably we could and should improve the logic when the server has > > started, right now it's pretty messy because we never treat a standby as > > up if hot_standby is disabled... > > True. If you could tell the diff

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Andres Freund writes: > Arguably we could and should improve the logic when the server has > started, right now it's pretty messy because we never treat a standby as > up if hot_standby is disabled... True. If you could tell the difference between "HS disabled" and "HS not enabled yet" from pg_c

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
I wrote: > Andres Freund writes: >> It'd not be unreasonble to check pg_control first, and only after that >> indicates readyness check via the protocol. > Hm, that's a thought. The problem here isn't the frequency of checks, > but the log spam. Actually, that wouldn't help much as things stand

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund
On 2017-06-26 16:26:00 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-06-26 16:19:16 -0400, Tom Lane wrote: > >> Sure, what do you think an appropriate behavior would be? > > > It'd not be unreasonble to check pg_control first, and only after that > > indicates readyness check via the

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Andres Freund writes: > On 2017-06-26 16:19:16 -0400, Tom Lane wrote: >> Sure, what do you think an appropriate behavior would be? > It'd not be unreasonble to check pg_control first, and only after that > indicates readyness check via the protocol. Hm, that's a thought. The problem here isn't

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund
On 2017-06-26 16:19:16 -0400, Tom Lane wrote: > Jeff Janes writes: > > The 10 fold increase in log spam during long PITR recoveries is a bit > > unfortunate. > > > 9153 2017-06-26 12:55:40.243 PDT FATAL: the database system is starting up > > 9154 2017-06-26 12:55:40.345 PDT FATAL: the databa

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Jeff Janes writes: > The 10 fold increase in log spam during long PITR recoveries is a bit > unfortunate. > 9153 2017-06-26 12:55:40.243 PDT FATAL: the database system is starting up > 9154 2017-06-26 12:55:40.345 PDT FATAL: the database system is starting up > 9156 2017-06-26 12:55:40.447 P

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Jeff Janes
On Mon, Jun 26, 2017 at 12:15 PM, Tom Lane wrote: > Michael Paquier writes: > > On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane wrote: > >> The attached proposed patch adjusts pg_ctl to check every 100msec, > >> instead of every second, for the postmaster to be done starting or > >> stopping. > > >>

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane
Michael Paquier writes: > On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane wrote: >> The attached proposed patch adjusts pg_ctl to check every 100msec, >> instead of every second, for the postmaster to be done starting or >> stopping. >> +#define WAITS_PER_SEC 10 /* should divide 100 evenly *

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-25 Thread Michael Paquier
On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane wrote: > The attached proposed patch adjusts pg_ctl to check every 100msec, > instead of every second, for the postmaster to be done starting or > stopping. This cuts the runtime of the recovery TAP tests from around > 4m30s to around 3m10s on my machine,

[HACKERS] Reducing pg_ctl's reaction time

2017-06-25 Thread Tom Lane
I still have a bee in my bonnet about how slow the recovery TAP tests are, and especially about how low the CPU usage is while they run, suggesting that a lot of the wall clock time is being expended on useless sleeps. Some analysis I did today found some low-hanging fruit there: a significant par