On 8/3/19 6:42 PM, Tom Lane wrote: > I wrote: >> * kerberos/t/001_auth.pl just blithely assumes that it can pick >> any random port above 48K and that's guaranteed to be free. >> Maybe we should split out the code in get_new_node for finding >> a free TCP port, so we can call it here? > I've confirmed that the reason it's failing on my machine is exactly > that krb5kdc tries to bind to a socket that is still in TIME_WAIT state. > Also, it looks like the socket is typically one that was used by the > GSSAPI client side (no surprise, the test leaves a lot more of those > than the one server socket), so we'd have no record of it even if we > were somehow saving state from prior runs. > > So I propose the attached patch, which seems to fix this for me. > > The particular case I'm looking at (running these tests in a tight > loop) is of course not that interesting, but I argue that it's just > increasing the odds of failure enough that I can isolate the cause. > A buildfarm animal running both kerberos and ldap tests is almost > certainly at risk of such a failure with low probability. > > (Still don't know what actually happened in those two buildfarm > failures, though.) > >
Looks good. A couple of minor nits: . since we're exporting the name there's no need to document it as a class method. I'd remove the "PostgresNode->" from the couple of places you have it in the docco. You're not actually calling it that way anywhere, and indeed doing so ends up passing 'PostgresNode' as a useless parameter to the subroutine. This is different from calling it with a qualified name (PostgresNode::get_free_port()). . in the inner loop we should probably exit the loop if we set found to 0. There's no point testing other addresses in that case. Something like "last unless found;" would do the trick. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services