Team,

I found a problem in the condition variable code that has created
a deadlock situation with the new implementation of tickless
callouts.

As you all know already, condition variables are used in
conjunction with a mutex. When a timed wait needs to
be done, the sequence should be:

 (mutex is held at this point)

create a timeout for the wait
insert the current thread in a sleep queue
release the mutex

switch()

The thread comes out of the switch when
    - the handler for the timeout fires OR
    - someone signals and wakes up the thread

untimeout(timeout id)
Reacquire mutex

In two places, the code correctly does the untimeout()
before acquiring the mutex. In one place, the
order is reversed.

The stock kernel does not have a problem as the timeout
handler does not acquire any mutex. It only does setrun().
But the stock kernel has another race condition where the
timeout can fire even before the thread can be inserted into
the sleepq. The thread would never be woken up.

To fix the race, I made the timeout handler acquire the mutex
and release it. But this creates a deadlock with untimeout()
when the untimeout() races with the timeout handler and waits
for the handler to finish.

I have fixed the order in the condition variable code. I will
let you know the results of the tests I am going to run
to test this.

I am letting you know so you don't spend time discovering this
problem during code review.

Madhavan

Reply via email to