I wrote:
> Michael Paquier <michael.paqu...@gmail.com> writes:
>> And this gives the patch attached, just took the time to hack it.

> I think this is a good idea, but (1) I'm inclined not to restrict it to
> Windows, and (2) I think we should hold off applying it until we've seen
> a failure or two more, and can confirm whether d1b7d4877 does anything
> useful for the error messages.

OK, we now have failures from both bowerbird and jacana with the error
reporting patch applied:

and they both boil down to this:

pg_ctl: could not start server
Examine the log output.
# pg_ctl failed; logfile:
LOG:  could not bind IPv4 socket: Permission denied
HINT:  Is another postmaster already running on port 60200? If not, wait a few 
seconds and retry.
WARNING:  could not create listen socket for ""
FATAL:  could not create any TCP/IP sockets
LOG:  database system is shut down

So "permission denied" is certainly more useful than "no error", which
makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
and should get back-patched --- any objections?

However, it's still not entirely clear what is the root cause of the
failure and whether a patch along the discussed lines would prevent its
recurrence.  Looking at TranslateSocketError, it seems we must be seeing
an underlying error code of WSAEACCES.  A little googling says that
Windows might indeed return that, rather than the more expected
WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE:

        Another possible reason for the WSAEACCES error is that when the
        bind function is called (on Windows NT 4.0 with SP4 and later),
        another application, service, or kernel mode driver is bound to
        the same address with exclusive access. Such exclusive access is a
        new feature of Windows NT 4.0 with SP4 and later, and is
        implemented by using the SO_EXCLUSIVEADDRUSE option.

So theory A is that some other program is binding random high port numbers
with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
Windows antivirus software doing what Windows antivirus software typically
does, ie inject random permissions failures depending on the phase of the
moon.  It's not very clear that a test along the lines described (that is,
attempt to connect to, not bind to, the target port) would pre-detect
either type of error.  Under theory A, a connect() test would recognize
the problem only if the other program were using the port to listen rather
than make an outbound connection; and the latter seems much more likely.
(Possibly we could detect the latter case by checking the error code
returned by connect(), but Michael's proposed patch does no such thing.)
Under theory B, we're pretty much screwed, we don't know what will happen.

I wonder what Andrew can tell us about what else is running on that
machine and whether either theory has any credibility.

BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
have failed to translate it --- surely that's an oversight?

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to