On Thu, Apr 21, 2016 at 11:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote:
>> Michael Paquier <michael.paqu...@gmail.com> writes:
>>> And this gives the patch attached, just took the time to hack it.
>> I think this is a good idea, but (1) I'm inclined not to restrict it to
>> Windows, and (2) I think we should hold off applying it until we've seen
>> a failure or two more, and can confirm whether d1b7d4877 does anything
>> useful for the error messages.
> OK, we now have failures from both bowerbird and jacana with the error
> reporting patch applied:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39
> and they both boil down to this:
> pg_ctl: could not start server
> Examine the log output.
> # pg_ctl failed; logfile:
> LOG:  could not bind IPv4 socket: Permission denied
> HINT:  Is another postmaster already running on port 60200? If not, wait a 
> few seconds and retry.
> WARNING:  could not create listen socket for ""
> FATAL:  could not create any TCP/IP sockets
> LOG:  database system is shut down
> So "permission denied" is certainly more useful than "no error", which
> makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
> and should get back-patched --- any objections?

+1. That's useful in itself.

> However,
> [...]
> So theory A is that some other program is binding random high port numbers
> with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
> Windows antivirus software doing what Windows antivirus software typically
> does, ie inject random permissions failures depending on the phase of the
> moon.  It's not very clear that a test along the lines described (that is,
> attempt to connect to, not bind to, the target port) would pre-detect
> either type of error.  Under theory A, a connect() test would recognize
> the problem only if the other program were using the port to listen rather
> than make an outbound connection; and the latter seems much more likely.
> (Possibly we could detect the latter case by checking the error code
> returned by connect(), but Michael's proposed patch does no such thing.)

Perl's connect() can be made more chatty. $! returns the error string,
$!+0 the errno. With the patch I sent previously, we'd need to change
this portion:
+           socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die;
+           $found = 0 if connect(SOCK, $paddr);
+           close(SOCK);
Basically, that would something like that, which would be still better
than nothing I think:
if (!connect())
     print 'connect error = ', $!, '\n';
Honestly, I think even if we will never reach perfection here,
something like my previous patch would still allow us to make the
tests more reliable on a platform where services listen to localhost.

> Under theory B, we're pretty much screwed, we don't know what will happen.

Indeed. If things are completely random, there is nothing guaranteeing
us that a connect() failing at instant T, meaning that a port is
available at this moment, is not going to be taken at moment (T+1)
because of the window between which the free port is checked and
postgres is going to bind this port. If we free up the port just
before starting Postgres there would be a reduced failure window,
still that cannot be reduced to 0.

> BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
> have failed to translate it --- surely that's an oversight?

Yes, and I can see you fixed that with 125ad53 already.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to