I wrote:
> However, it's still not entirely clear what is the root cause of the
> failure and whether a patch along the discussed lines would prevent its
> recurrence.  Looking at TranslateSocketError, it seems we must be seeing
> an underlying error code of WSAEACCES.  A little googling says that
> Windows might indeed return that, rather than the more expected
> WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE:

>       Another possible reason for the WSAEACCES error is that when the
>       bind function is called (on Windows NT 4.0 with SP4 and later),
>       another application, service, or kernel mode driver is bound to
>       the same address with exclusive access. Such exclusive access is a
>       new feature of Windows NT 4.0 with SP4 and later, and is
>       implemented by using the SO_EXCLUSIVEADDRUSE option.

> So theory A is that some other program is binding random high port numbers
> with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
> Windows antivirus software doing what Windows antivirus software typically
> does, ie inject random permissions failures depending on the phase of the
> moon.  It's not very clear that a test along the lines described (that is,
> attempt to connect to, not bind to, the target port) would pre-detect
> either type of error.  Under theory A, a connect() test would recognize
> the problem only if the other program were using the port to listen rather
> than make an outbound connection; and the latter seems much more likely.

I took a second look at the above-quoted Microsoft documentation, and
noticed that it specifies that this error occurs when another application
is *bound* to the target address.  If by that they mean that the other
app did a bind(), then indeed what we're seeing here is a conflict with
a listening app, so that the proposed patch would detect it.  So I went
ahead and pushed the patch --- in any case, it shouldn't make things
any worse.

Also, I did a bit of digging in the buildfarm logs, and noticed that
bowerbird and jacana together have reported 34 "could not bind socket"
failures in BinInstallCheck since 2015-12-07 (when the current logic for
selecting a random port went in).  Between 2015-01-01 and 2015-12-07,
they reported only *one* such failure.  So whatever the exact explanation
is, we've greatly increased the probability of such failures by using a
random port rather than the fixed port 65432 that was used before.
I'm not entirely sure what to make of this observation, but the statistics
seem pretty clear.

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to