On Mon, 16 Jun 2003, James Olin Oden wrote:

> On Mon, 16 Jun 2003, Bill Nottingham wrote:
> 
> > James Olin Oden ([EMAIL PROTECTED]) said: 
> > > On Mon, 16 Jun 2003, Bill Nottingham wrote:
> > > 
> > > > James Olin Oden ([EMAIL PROTECTED]) said: 
> > > > > and looked at things.  The last syscall I see  init in after 
> > > > > running the init 6, is:
> > > > > 
> > > > >       futex(0x4212f1f4, FUTEX_WAIT, -1, NULL
> > > > 
> > > > What glibc are you running?
> > > >
> > > I am running:
> > > 
> > >   glibc-2.3.2-27.9
> > > 
> > > I think this is the latest errata...I just downloaded all the errata (well
> > > what I did not have) today, and it was the most recent one.  BTW, I was 
> > > trying to recompile this version of glibc without stripping its symbols,
> > > and I get the following error:
> > 
> > Are you running the errata kernel as well?
> >
> I am now running with 2.4.20-18.9bigmem, and the problem is still 
> occuring.
>
Got it!  Here is what is happening when you run init 6 with the debug
output turned on in init:

        1) init reads the fifo when it gets around to it.
        2) It sees there is request for a runlevel change (6),
           and begins killing appropriate processes.
        3) One of those processes will be a getty, inevitably.
           The getty goes away, and inevitably some children are
           left behind.  They are given to init by the kernel, and the
           kernel sends  SIGCHILD to init.
        4) Meanwhile back in init, it has been going through its 
           init_main loop again, and is printing debug output to
           this effect and sending it to syslog.  When it sends the
           message via the syslog call a futex is created so that
           other processes can't do this till its done.
        5) While its in the glibc code, init receives the SIGCHILD
           and and in the child handler it calls log() again set 
           to send output to syslog and the console.
        6) When it tries to send the child handler log message
           to syslog it enters the glibc code that blocks waiting
           on the futex...and there it sits.

I patched init to block all signals while talking to syslog,
and this seems to have fixed it.  I will submit a patch via bugzilla
in the morning.   This probably seems to only happen on our duel processor
machines because the sigchild can truly be sent asynchronously from 
init.  That is my theory anyway.

This was problem number two, though, so I will be back problem number
one soon (the internal buffer overflow).  I am pretty sure what is
happening in that scenario:

        1) init goes to print "Entering runlevel 4", only
           the runlevel data is munged causing a segfault in
           syslog.
        2) The segv handler is kicked off and tries to log its message.
           It can't, because the lock has not gone away on the 
           syslog code.
        3) init hangs waiting on the futex. 

This corruption though is much more infrequent (sometimes requiring
hundreds of reboots), but with the patch I did, expect to see it happen
only this time get a core.

Cheers...james
 
> Cheers...james 
> > Bill
> > 
> > 
> > _______________________________________________
> > Redhat-devel-list mailing list
> > [EMAIL PROTECTED]
> > https://www.redhat.com/mailman/listinfo/redhat-devel-list
> > 
> 
> 
> _______________________________________________
> Redhat-devel-list mailing list
> [EMAIL PROTECTED]
> https://www.redhat.com/mailman/listinfo/redhat-devel-list
> 


_______________________________________________
Redhat-devel-list mailing list
[EMAIL PROTECTED]
https://www.redhat.com/mailman/listinfo/redhat-devel-list

Reply via email to