Stuart, > The range that Dmitry suggests is reasonable, though I note that the > actual ephemeral port range used by the kernel will differ from OS to > OS and even from system to system. I don't know if that's really > significant though. If you end up choosing a port outside the > ephemeral range for some system, does it really matter?
This range is assigned by IANA so it's a standard. -Dmitry On 2015-01-09 05:50, Stuart Marks wrote: > Hi Jaroslav, > > I'm distant enough from this code that I don't think I'm in a position > to say "no you can't check this in," and I'm mindful of the fact that > this bug is a high priority and you want to get a fix in. But having > said that I think there's a surprising amount of complexity here for > what it does. > > Implementing a retry-on-BindException policy seems to be fairly > sensible, since I assume it would be fairly invasive to modify the code > in the JVMs being forked to use an ephemeral port and send that > information back to the test. > > My conjecture is however that the open/close/reopen logic is the primary > cause of the BindExceptions that occur. If you're going to retry on > BindException, you might as well choose a random port number instead of > doing the open/close to get a port number from the system. > > The range that Dmitry suggests is reasonable, though I note that the > actual ephemeral port range used by the kernel will differ from OS to OS > and even from system to system. I don't know if that's really > significant though. If you end up choosing a port outside the ephemeral > range for some system, does it really matter? > > If you do decide to have PortAllocator open and close a ServerSocket (in > order to find a previously unused port) I'd suggest removing the logic > that leaves the socket open until the first call to get(). That logic > reduces the possibility that some other process will open the socket > after the close but before the reopen. In my experience that's not > what's causing the BindExceptions. It could still happen, though, but > you're protected by the retry logic anyway. Leaving the socket open > longer actually hurts, I think, because it increases the chance that the > kernel won't have cleaned up the port by the time the test wants to > reopen it. > > If you change PortAllocator to close the socket immediately, you can get > rid of the need to call release() in a finally-block of the caller. You > could also change the type of the functional interface to be > > int[] -> void > > since the PortAllocator doesn't hold onto any resources that need to be > cleaned up. It just calls the execute() method and passes an array of n > port numbers. > > It's probably necessary to have the socket close() call in a retry loop. > The socket is always closed immediately from the user process point of > view, so isClosed() will always return true immediately after the > close() call returns. You can verify this easily by looking in the > ServerSocket.java source code. I believe the state that prevents the > port from being reused immediately is private to the kernel and cannot > be observed from a user process, at least not without attempting to > reopen the socket. > > Side note: one of the jcmd() overloads says that parameter 'c' (a > Consumer) may be null. It doesn't look like this is handled. If you > really want to support this, I'd assign () -> { } to it if it's null so > that it can be called unconditionally. (Or just disallow null.) > > s'marks > > > On 1/6/15 2:00 PM, Dmitry Samersoff wrote: >> Jaroslav, >> >> It might be better to just choose a random digit between 49152–65535 >> and attempt to use it. >> >> -Dmitry >> >> >> On 2014-12-18 17:29, Jaroslav Bachorik wrote: >>> On 12/11/2014 03:43 PM, Dmitry Samersoff wrote: >>>> Jaroslav, >>>> >>>> You can set SO_LINGER to zero, in this case socket will be closed >>>> immediately without waiting in TIME_WAIT >>>> >>>> But there are no reliable way to predict whether you can take this port >>>> or not after you close it. >>>> >>>> So the only valid solution is to try to connect to a random port and if >>>> this attempt fails try another random port. Everything else will cause >>>> more or less frequent intermittent failures. >>> >>> Thanks for all the suggestions! >>> >>> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.02 >>> >>> I've enhanced the original patch with the retry logic using different >>> random port if starting the JMX agent on the provided port fails with >>> BindException. >>> >>> I'm keeping there the changes for properly closing the ports opened for >>> the test purposes and also setting the SO_REUSEADDR - anyway, it does >>> not make sense to reuse the ephemeral test ports. >>> >>> I've split the original "test_06" test case in order to keep it readable >>> even with the new retry logic - and also to make each test case to test >>> just one scenario. >>> >>> Cheers, >>> >>> -JB- >>> >>>> >>>> -Dmitry >>>> >>>> >>>> On 2014-12-11 17:06, Jaroslav Bachorik wrote: >>>>> On 12/09/2014 01:25 PM, Jaroslav Bachorik wrote: >>>>>> On 12/09/2014 01:39 AM, Stuart Marks wrote: >>>>>>> On 12/8/14 12:35 PM, Jaroslav Bachorik wrote: >>>>>>>> Please, review the following test change >>>>>>>> >>>>>>>> Issue : https://bugs.openjdk.java.net/browse/JDK-8066708 >>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8066708/webrev.00 >>>>>>>> >>>>>>>> The test fails very intermittently when RMI registry is trying to >>>>>>>> bind >>>>>>>> to a port >>>>>>>> previously used in the test (via ServerSocket). >>>>>>>> >>>>>>>> This seems to be caused by the sockets created via `new >>>>>>>> ServerSocket(0)` and >>>>>>>> being in reusable mode. The fix attempts to prevent this by >>>>>>>> explicitly >>>>>>>> forbidding the reusable mode. >>>>>>> >>>>>>> Hi Jaroslav, >>>>>>> >>>>>>> I happened to see this fly by, and there are (I think) some similar >>>>>>> issues going on in the RMI tests. >>>>>>> >>>>>>> But first I'll note that I don't think setReuseAddress() will >>>>>>> have the >>>>>>> effect that you want. Typically it's set to true before binding a >>>>>>> socket, so that a subsequent bind operation will succeed even if the >>>>>>> address/port is already in use. ServerSockets created with new >>>>>>> ServerSocket(0) are already bound, and I'm not sure what calling >>>>>>> setReuseAddress(false) will do on such sockets. The spec says >>>>>>> behavior >>>>>>> is undefined, but my bet is that it does nothing. >>>>>>> >>>>>>> I guess it doesn't hurt to try this out to see if it makes a >>>>>>> difference, >>>>>>> but I don't have much confidence it will help. >>>>>>> >>>>>>> The potential similarity to the RMI tests is exemplified by >>>>>>> JDK-8049202 >>>>>>> (sorry, this bug report isn't open) but briefly this tests the RMI >>>>>>> registry as follows: >>>>>>> >>>>>>> 1. Opens port 1099 using new ServerSocket(1099) [1099 is the default >>>>>>> RMI registry port] in order to ensure that 1099 isn't in >>>>>>> use by >>>>>>> something else already; >>>>>>> >>>>>>> 2. If this succeeds, it immediately closes the ServerSocket. >>>>>>> >>>>>>> 3. Then it creates a new RMI registry on port 1099. >>>>>>> >>>>>>> In principle, this should succeed, yet it fails around 10% of the >>>>>>> time >>>>>>> on some systems. The error is "port already in use". My best >>>>>>> theory is >>>>>>> that even though the socket has just been closed by a user program, >>>>>>> the >>>>>>> kernel has to run the socket through some of the socket states >>>>>>> such as >>>>>>> FIN_WAIT_1, FIN_WAIT_2, or CLOSING before the socket is actually >>>>>>> closed >>>>>>> and is available for reuse. If a program -- even the same one -- >>>>>>> attempts to open a socket on the same port before the socket has >>>>>>> reached >>>>>>> its final state, it will get an "already in use error". >>>>>>> >>>>>>> If this is true I don't believe that setting SO_REUSEADDR will >>>>>>> work if >>>>>>> the socket is in one of these final states. (I remember reading this >>>>>>> somewhere but I'm not sure where at the moment. I can try to dig >>>>>>> it up >>>>>>> if there is interest.) >>>>>>> >>>>>>> I admit this is just a theory and I'm open to alternatives, and I'm >>>>>>> also >>>>>>> open to hearing about ways to deal with this problem. >>>>>>> >>>>>>> Could something similar be going on with this JMX test? >>>>>> >>>>>> Hm, this is exactly what happened with this test :( >>>>>> >>>>>> The problem is that the port is reported as available while it is >>>>>> still >>>>>> occupied and RMI registry attempts to start using that port. >>>>>> >>>>>> If setting SO_REUSEADDR does not work then the only solution would >>>>>> be to >>>>>> retry the test case when this exception occurs. >>>>> >>>>> Further investigation shows that the problem was rather the client >>>>> connecting to a socket being shut down. >>>>> >>>>> It sounds like setting SO_REUSEADDR to false should prevent this >>>>> failure. >>>>> >>>>> From the ServerSocket javadoc: >>>>> "When a TCP connection is closed the connection may remain in a >>>>> timeout >>>>> state for a period of time after the connection is closed (typically >>>>> known as the TIME_WAIT state or 2MSL wait state). For applications >>>>> using >>>>> a well known socket address or port it may not be possible to bind a >>>>> socket to the required SocketAddress if there is a connection in the >>>>> timeout state involving the socket address or port." >>>>> >>>>> It also turns out that the test does not close the server sockets >>>>> properly so there might be several sockets being opened or timed out >>>>> dangling around. >>>>> >>>>> I've updated the test so it is setting SO_REUSEADDR for all the new >>>>> ServerSockets instances + introduced the mechanism to run the test >>>>> code >>>>> while properly cleaning up any allocated ports. >>>>> >>>>> http://cr.openjdk.java.net/~jbachorik/8066708/webrev.01/ >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> s'marks >>>>>> >>>>> >>>> >>>> >>> >> >> -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the source code.