Re: [OpenAFS-devel] volserver hangs possible fix

Ted Anderson Mon, 18 Apr 2005 07:31:10 -0700

On 4/18/2005 08:58, Horst Birthelmer wrote:

The problem isn't whether cond_wait is atomic. It's what happens to the algorithm if it's not. Imagine the scenario where it's not atomic (and this was the part where I agreed with Tom) and you have the mutex locked in the cond_wait call, but the thread isn't in the queue yet. Now this thread gets interrupted by whatever event and you perform a ...cond_broadcast(). All the threads are woken up except that one not in there yet. You have a thread waiting a cond_var you weren't aware of... actually you have that thread waiting on that condition variable while you already performed a broadcast. That's pretty weird for the algorithm.

Okay, but I don't agree that this situation would generate a problem in correctly written CV code. As long as the mutex is still held by the trying-to-sleep thread when the broadcast() occurs, then its *reason* for sleeping will still be true and hence there will eventually be another thread to come along and wake it up.

However, I am concerned that you introduce this scenario with "where it's not atomic". Are there cases where cond_wait() is not atomic it is necessary to write code to take that into account? So my question is still, what these correct but not atomic cond_wait() implementations are like and how putting the broadcast() into the protection of the mutex would help.

I should also say that I have not looked that the particular callback code at issue here. Perhaps it is using broadcast() in some unusual fashion (i.e. not using the producer/consumer model) that affects this discussion.

Ted
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] volserver hangs possible fix

Reply via email to