Quoth Mark Martin on Thu, Nov 15, 2007 at 11:57:17AM -0600: > http://bugs.opensolaris.org/view_bug.do?bug_id=6409970 > I wrote a quick little test harness (perl) that tries to tickle this > defect. I'm unsuccessful at reproducing the defect on my little Blade 1000 > (dual 750MHz, snv_73). I'm wondering if there's a specific configuration > you feel might reproduce the defect?
I filed the bug in response to http://www.opensolaris.org/jive/message.jspa?messageID=31351#31351 . Marcus was killing the process while it was pstop'ed. And indeed, on my snv_69 laptop, killing startd while it's running seems to always work, but killing it while it's pstop'ed seems never to work. Which raises the question of whether processes are supposed to receive signals when they're restarted, and unfortunately I don't know the process model well enough to know that off the top of my head. From Marcus' observations, though, it seems that there is at least some inconsistency here. It might also be a race condition, in which case multi-CPU vs single-CPU might make a difference. > I'm unable to have this produce an error count over 1000's of iterations. > Tuning the wait time down from 5 seconds to 2 or 3 sometimes produces > results, but that just suggests to me that svc.startd is just slow to exit > sometimes. I'm inclined to implement the "Fix" regardless, but I'd rather > do that in the presence of a failing test harness. I agree with you, though if we can't reproduce the bug, then I'm inclined to close it as not reproducible. > My development boxen are down until this weekend at earliest. If you'd > like, I can post the test script this weekend for review or for you to run > in other environments. Meanwhile, I'll inquire about using the test farm to > see if I can reproduce it there. Let's try to determine what the correct behavior with respect to signals and prun is first. I'll consult my reference books when I return to the office tomorrow, but hopefully someone else can offer advice. David