On Tue 29 Jan 2008 at 12:10PM, Mark Martin wrote: > I'd appreciate comments on my fix for bug 6409970. > > The diffs are available at: http://cr.opensolaris.org/~devnull/6409970/ > > I have an extremely simple test harness available at: > http://cr.opensolaris.org/~devnull/6409970/checkstartd. > Please do not judge me by my lack of script-fu. > > Thanks, > Mark
Mark, I looked at the code; I think I hit this bug once so I was curious. I spotted a couple of concerns. The first is that you're not examining (or marking (void)) the return values of some function which do return values, like pthread_cond_wait (which returns an int). So, this code will not pass lint. Second, there are cases in which pthread_cond_wait() can suffer from a so called "spurious wakeup"-- where we wakeup even though we were never signalled. As the man page says: If a signal is delivered to a thread waiting for a condition variable, upon return from the signal handler the thread resumes waiting for the condition variable as if it was not interrupted, or it returns 0 due to spurious wakeup. A longer discussion on this with an example fix is here: http://docsun.cites.uiuc.edu/sun_docs/C/solaris_9/SUNWdev/MTP/p28.html Threads are never as simple as they seem to be :) Finally, it seems to me that this might be more easily fixed by masking the offending signal in all threads except for the dedicated signal catcher-- in which case the signal will be forced to go to that thread and only that thread; this could be done with some signal manipulation in startd_thread_create(). -dp -- Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp