On 27.10.2014 3:36, David Holmes wrote:
On 27/10/2014 1:15 AM, Ivan Gerasimov wrote:
David, would you approve this fix?

Sorry Ivan I'm having trouble following the logic this time - could you add some comments about what we are checking at each step.

Yes, sure.

The main idea is to make the thread that ends the process wait for the threads that had finished so far.
Thus, we have an array for storing the thread handles.
Any thread that is on thread-exit path, first tries to remove the completed threads from the array (to keep the list smaller), and then adds its own handle to the end of the array. The thread that is on process-exit path, calls exit (or _exit), while still owning the critical section. This way we make sure, no other threads execute any exit-related code at the same time.

Here's a typical scenario:
1) First thread that decided to end itself calls exit_process_or_thread() -- let's assume it is on thread-exit path.
Initializes the critical section.
2) Grabs the ownership of the crit. section
3) The list of thread handles is initially empty, so the thread adds a duplicate of its handle to the array.
4) Releases the crit. section
5) Calls _endthreadex() to terminate itself

6) Another thread enters exit_process_or_thread() -- let it be on thread-exit path as well.
7) Grabs the ownership of the crit. section
8) In a loop checks if any previously ended thread has completed.
Here we call WaitForSingleObject with zero timeout, so we don't block.
All the handles of completed threads are closed.
9) If there's is a free slot in the array, the thread adds its handle to the end 10) If the array is full (which is very unlikely), the thread waits for ANY thread to complete, and then adds itself to the array.
11) Releases the crit. section
12) Calls _endthreadex() to terminate itself

13) Some thread enters exit_process_or_thread() in order to end the whole process.
14) Grabs the ownership of the crit. section
15) Waits on all the threads that have added their handles to the array (typically there will be only one such thread handle). Since the ownership of the critical section is held, no other threads will execute any exit-related code at this time. 16) Once all the threads from the list have completed, the thread closes the handles and calls exit() (or _exit()), holding the crit. section ownership.

We're done.

Error handling: in a case of errors, we report them, and proceed with exiting as usual. - If initialization of critical section fails, we'll just call the corresponding exit routine. - If we failed, waiting for an exiting thread to complete, close its handle as if it has completed. - If we failed, waiting for any thread to complete withing a time-out (array is full), close all the handles and continue as if there were no threads exited before. - If we couldn't duplicate the handle, ignore it (don't add it to the array), so no one will wait for it later. - If the thread on the process-exit path failed to wait for the threads to complete withing the time-out, proceed to the exit anyway.

All these errors should never happen during normal execution, but if they do, we still try to end threads/process in a way it's done now.
In this, later case, we are at risk of observing a race condition.
However, the chances of this happening are much lesser, and in addition we'll have a waring message to analyze.

Possible bottlenecks.
1) All the threads have to obtain the ownership of the critical section, which effectively serializes all the exiting threads. However, this doesn't appear to make things too much slower, as all the threads already do similar thing in _endthreadex().
2) Normally, the threads don't block having ownership of the crit. section.
The block can only happen if there's no free slot in the array of handles.
This can only happen if MAX_EXIT_HANDLES (== 16) threads have just called _endthreadex(), and none of them completed. 3) When the thread at process-exit path waits for all the exiting threads to complete, the time-out of 1 second is specified. If any of those threads do not complete, this can lead to that the application is delayed at the exit. However, we don't block forever, and the delay can only be observed upon a failure.


Also we seem to exit while still holding the critical section - how does that work?

Right.
We make the thread at the process-exit path call exit() from withing critical section block. This way it is ensured no other exit-related code is executed at the same moment, and a race is avoided.

Sincerely yours,
Ivan

Thanks,
David

Sincerely yours,
Ivan

On 26.10.2014 19:01, Daniel D. Daugherty wrote:
On 10/25/14 12:23 PM, Ivan Gerasimov wrote:

On 25.10.2014 3:06, Daniel D. Daugherty wrote:
On 10/1/14 3:07 AM, Ivan Gerasimov wrote:
Hello!

The tests that continue to fail with wrong exit codes suggest that
the fix for JDK-8057744 wasn't sufficient.
Here's another proposal, which expands the synchronized portion of
the code.
It is proposed to make the exiting process wait for the threads
that have already started exiting.
This should help to make sure that no thread is executing any
potentially racy code concurrently with the exiting process.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8059533
WEBREV: http://cr.openjdk.java.net/~igerasim/8059533/0/webrev/

Finally got a chance to look at the official version of fix.

Thumbs up!

src/os/windows/vm/os_windows.cpp
    No comments.

Thank you Daniel!

I assume the change needs the second hotspot reviewer?

Yes, HotSpot changes always need two reviewers. David Holmes
chimed in on this thread. You should ask him if he can be
counted as a reviewer.


What would be the best time for pushing this fix?

Let's go for Wednesday again so we have a full week of testing
to evaluate this latest tweak.

Dan



Sincerely yours,
Ivan

Dan

P.S.
We had another sighting of an exit_code == 60115 test failure
this past week so while your previous fix greatly reduced the
odds of this race, I'm looking forward to seeing this new
version in action...




Comments, suggestion are welcome!

Sincerely yours,
Ivan











Reply via email to