Hi Gary,

I'm rethinking if this is a good idea. If we are going to not count interrupted sleeps() towards the 20, then I decided at the very least we need a safeguard in case something was drastically wrong and we are getting interrupted on every sleep(). So I decided after 5 interrupted sleeps() I'd start ignoring the fact that the sleep() was interrupted and count the remaining sleeps towards the 20. But this just means if we are interrupted on every sleep(), then now we try 25 times instead of 20. That's not really helping any. With the current implementation, if the interrupted sleeps are frequent but somewhat sporadic, you'll still get a few real sleeps() in during the 20 attempts, and that should be good enough. If the interrupt happens every time, ignoring the first few is not really of any value.

Chris

On 8/3/18 11:47 AM, Chris Plummer wrote:
I can add that.

thanks,

Chris

On 8/3/18 11:41 AM, Gary Adams wrote:
Should you capture the return from sleep() to make sure the accumulated
timeout of 20 seconds is achieved? Worst case scenario - every sleep is
immediately interrupted.

On 8/3/18, 2:37 PM, Gary Adams wrote:
Hello,

Please review the following fix for JDK12:

http://cr.openjdk.java.net/~cjplummer/8199811/webrev.00

https://bugs.openjdk.java.net/browse/JDK-8199811

The root of the problem is that there is no code in the solaris or 
windows AttachListener support that ensures that the listener is done 
being initialized before attaching and  attempting to enqueue the first 
command. The enqueue operation fails when is sees that the listener is 
not attached yet.

I was able to force this failure to happen every time by adding a 10 
second sleep in attach_listener_thread_entry() just before the call to 
AttachListener::set_initialized(). This did not cause macosx or linux to 
fail, but did make solaris fail (failures had not been noted previously) 
and windows to fail (failures previously had been observed, but very 
rarely).

The proposed fix is to have the enqueue code sleep for up to 20 seconds 
(in 1 second intervals) waiting for initialization to be complete. I 
found this fixed the problem, even with the 10 second sleep in 
attach_listener_thread_entry() still in place. A shorter sleep is 
probably fine. I'm open to suggestions. Since this timing issue was so 
rare, my guess is that a single 1 second sleep is likely to always fix 
it, but since it is so hard to reproduce (without the 10 second sleep in 
place), I can't say for sure.

Another approach to fixing this would be to use some sort of 
synchronization between the init and enqueue code, like a condition 
variable. I think I know how to do this with pthread_cond_wait() and 
pthread_cond_signal(), although it gets to be a bit tricky since I'd 
probably have to make the enqueue code create the condvar if 
initialization is not yet complete, and then have the initialization 
code check for the existence of the condvar when initialization is 
complete, and signal on it if it exists. I'm pretty sure there's a 
potential for race condition in there. I haven't thought it through 
enough to say for sure. I also looked a bit at condition variable 
support on windows, and it looks like I could do something similar there 
too. However, I think the sleep approach I have implement is far more 
straight forward and less error prone, so I'd prefer to stick with it if 
others approve.

thanks,

Chris



Reply via email to