On Tue 29 Jan 2008 at 12:10PM, Mark Martin wrote:
> I'd appreciate comments on my fix for bug 6409970.
> 
> The diffs are available at: http://cr.opensolaris.org/~devnull/6409970/
> 
> I have an extremely simple test harness available at:
> http://cr.opensolaris.org/~devnull/6409970/checkstartd.
> Please do not judge me by my lack of script-fu.
> 
> Thanks,
> Mark

Mark, I looked at the code; I think I hit this bug once so I was
curious.  I spotted a couple of concerns.

The first is that you're not examining (or marking (void)) the return
values of some function which do return values, like pthread_cond_wait
(which returns an int).  So, this code will not pass lint.

Second, there are cases in which pthread_cond_wait() can suffer
from a so called "spurious wakeup"-- where we wakeup even though
we were never signalled.  As the man page says:

     If a signal is delivered to a thread waiting for a condition
     variable,  upon  return  from  the signal handler the thread
     resumes waiting for the condition variable as if it was  not
     interrupted, or it returns 0 due to spurious wakeup.

A longer discussion on this with an example fix is here:

http://docsun.cites.uiuc.edu/sun_docs/C/solaris_9/SUNWdev/MTP/p28.html

Threads are never as simple as they seem to be :)


Finally, it seems to me that this might be more easily fixed by masking
the offending signal in all threads except for the dedicated
signal catcher-- in which case the signal will be forced to go to
that thread and only that thread; this could be done with some signal
manipulation in startd_thread_create().

        -dp

-- 
Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp

Reply via email to