Just to make sure I understand this problem, let me try providing a specific scenario: thread A thread B -------- -------- 1: lock M; 2: check queue; 3: cond_wait(C,M); // queue is empty 4: -> kernel_scheduler { 5: unlock M; 6: lock M; ... 7: add item; ... 8: unlock M; ... 9: broadcast on C; ... // noop, nobody blocked on C 10: put B on C; 11: desched B; 12: DEADLOCK
Is this a problem that can occur with a corrently operating implementation of POSIX threads, i.e. an example of correct but non-deterministic scheduling? I've expanded the operation of cond_wait as I understand it. I observe that M correctly protects the queue itself. Treating "broadcast C" and "put B on C" as black boxes, however, it doesn't seem to help to move broadcast(C) inside M, i.e. by swapping 8 & 9. On the other hand, tt would seem a simple matter to reverse the sequence of 5 & 10, but perhaps that is inconvenient for some multi-processor schedulers.
As I understand your comment, you are saying that the broadcast must be protected by the mutex or race conditions are possible. What am I missing here?
What you're missing here is that cond_wait() is atomic. The lines you list as 5 and 10 happen at the same time, so it's not possible for thread A to acquire the lock before thread B is on the wait-list for the CV. This is why it is necessary to hold the lock through the broadcast in order to prevent a race -- otherwise, the broadcast could indeed occur between lines 2 and 10, and thread B would miss the wakeup.
-- Jeffrey T. Hutzelman (N3NHS) <[EMAIL PROTECTED]> Sr. Research Systems Programmer School of Computer Science - Research Computing Facility Carnegie Mellon University - Pittsburgh, PA
_______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
