Re: Re: Sarge box not rebooting..

2004-12-30 Thread Robert Waldner

(not at the office today, I hope I got the reply right nonetheless)

Miquel van Smoorenburg wrote:
Well, it would be perfect if you could reproduce this. Try something like:

snip code

If this prints Caught SIGCLD then that is a severe kernel bug.
Try both cc foo.c and cc foo.c -lpthread please.

Doesn't print anything with neither 2.4.27-1-386 nor 2.4.27-1-686-smp,
 -lpthread makes no difference.

Oh and what is the output of ldd /sbin/killall5 ?

/sbin/killall5:
libc.so.6 = /lib/libc.so.6 (0x4001b000)
/lib/ld-linux.so.2 = /lib/ld-linux.so.2 (0x4000)
/sbin/killall5.orig:
libc.so.6 = /lib/libc.so.6 (0x4001b000)
/lib/ld-linux.so.2 = /lib/ld-linux.so.2 (0x4000)

(.orig is from the Debian package, the other one the self-built with 
 the extra printf's).

cheers,
rw
-- 
-- Frei nach Moores Gesetz: Der Ressourcenbedarf fuer ein und
--  die selbe Aufgabe vervierfacht sich alle 18 Monate. Irgendwann
--  wird man stolz sein, wenn man sleep(1) in Echtzeit schafft.
-- - Andreas Riedel in d.a.s.r.




pgpZzUAeCCd5t.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-29 Thread Robert Waldner

On Tue, 28 Dec 2004 18:30:27 +0100, Robert Waldner writes:
No, kill(-1, SIGWHATEVER) is guaranteed to kill all processes
/except/ the caller. man 2 kill on any unix/linux box. What kernel
are you using, this might be a kernel bug. Is this an i386 or
another architecture ?

2.4.27, from kernel-image-2.4.27-1-386, i386 arch, straight Sarge
 from d-i RC2, `apt-get upgrade` up-to-date as of now.

I've put more info (`dpkg -l`, `ps auxwww`, cpuinfo, meminfo, lsmod) at
 http://www.waldner.priv.at/temp/machine.txt (it'd make for one long
 email otherwise).

I've now tested through a couple kernel-images, and found that the 
 problem does NOT manifest itself when it's an SMP-kernel, eg 
 2.4.26-1-686-smp and 2.4.27-1-686-smp are fine, but the default 
 2.4.26/7-1-386 and their respective -686 siblings aren't.

What remains is the question if I should file this as a (grave?) bug 
 against kernel-image-2.4.27-1-386.

And, for people googling this up later: DO NOT run the standard Sarge 
 kernel 2.4.27-1-386 on HP/Compaq DL380 machines, you won't be able to 
 reboot/shutdown them.

cheers,
rw
-- 
/ Ing. Robert Waldner | Security Engineer |  CoreTec IT-Security  \
\   [EMAIL PROTECTED]   | T +43 1 503 72 73 | F +43 1 503 72 73 x99 /




pgp8MVU6hK8bp.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-29 Thread Miquel van Smoorenburg
On 2004.12.29 16:20, Robert Waldner wrote:
On Tue, 28 Dec 2004 18:30:27 +0100, Robert Waldner writes:
No, kill(-1, SIGWHATEVER) is guaranteed to kill all processes
/except/ the caller. man 2 kill on any unix/linux box. What kernel
are you using, this might be a kernel bug. Is this an i386 or
another architecture ?
2.4.27, from kernel-image-2.4.27-1-386, i386 arch, straight Sarge
 from d-i RC2, `apt-get upgrade` up-to-date as of now.
I've put more info (`dpkg -l`, `ps auxwww`, cpuinfo, meminfo, lsmod) at
 http://www.waldner.priv.at/temp/machine.txt (it'd make for one long
 email otherwise).
I've now tested through a couple kernel-images, and found that the 
 problem does NOT manifest itself when it's an SMP-kernel, eg 
 2.4.26-1-686-smp and 2.4.27-1-686-smp are fine, but the default 
 2.4.26/7-1-386 and their respective -686 siblings aren't.

What remains is the question if I should file this as a (grave?) bug 
 against kernel-image-2.4.27-1-386.

And, for people googling this up later: DO NOT run the standard Sarge 
 kernel 2.4.27-1-386 on HP/Compaq DL380 machines, you won't be able to 
 reboot/shutdown them.
Well, it would be perfect if you could reproduce this. Try something like:
#include stdio.h
#include signal.h
void sigcld(int sig)
{
printf(Caught SIGCLD\n);
}
int main()
{
signal(SIGCLD, sigcld);
kill(-1, SIGCLD);
sleep(2);
return 0;
}
If this prints Caught SIGCLD then that is a severe kernel bug.
Try both cc foo.c and cc foo.c -lpthread please.
Oh and what is the output of ldd /sbin/killall5 ?
Mike.


Re: Sarge box not rebooting..

2004-12-28 Thread Robert Waldner

On Mon, 27 Dec 2004 18:25:43 GMT, Miquel van Smoorenburg writes:
 I have a HP DL380 here with Sarge (current as of now) on it. Problem 
 is that it's not rebooting, eg if I call `reboot` or `telinit 6`, it 
 starts sending out TERM and KILL signals, and everything is stopping, 
 up to and including klogd and syslogd. Then, instead of writing 
 Rebooting...  (and actually rebooting), as I'd expect, it again 
 writes Sending all processes the TERM signal ... and does nothing
 more.

 I've checked further, and what's holding it up 
 is /etc/rc6.d/S20sendsigs, the `killall5 -15`. I strace'd it, and the 
 last thing I get is rt_sigaction(-1, SIGSTOP (note the missing )). 

Ofcourse, by then the strace process is sigSTOPped too. Heisenbug.

D'oh!

 If I background both killall5's, it comes as far as Saving random 
 seed... done, eg S30urandom finishes.

Hmm, can it be that killall5 doesn't actually manage to *not* kill 
 itself?

Ofcourse it goes through great lengths to do exactly that - NOT
kill itself. It kills all processes _except_ itself and its
caller.

Any hints on what it _could_ be, or on what I can do to further narrow
 down the problem?

cheers+tia,
rw
-- 
/ Ing. Robert Waldner | Security Engineer |  CoreTec IT-Security  \
\   [EMAIL PROTECTED]   | T +43 1 503 72 73 | F +43 1 503 72 73 x99 /




pgpZ2zDgEFQUZ.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-28 Thread Robert Waldner

On Tue, 28 Dec 2004 08:40:48 +0100, Robert Waldner writes:
Hmm, can it be that killall5 doesn't actually manage to *not* kill
 itself?

Ofcourse it goes through great lengths to do exactly that - NOT
kill itself. It kills all processes _except_ itself and its
caller.

Any hints on what it _could_ be, or on what I can do to further narrow
 down the problem?

Well, I expanded killall5.c with a couple printf's:

...
int main(int argc, char **argv)
{
...
signal(SIGTERM, SIG_IGN);
signal(SIGSTOP, SIG_IGN);
signal(SIGKILL, SIG_IGN);

/* Now stop all processes. */
// changes rw
printf(now doing kill(-1, SIGSTOP);\n);
kill(-1, SIGSTOP);
sent_sigstop = 1;
printf(done with kill(-1, SIGSTOP);\n);
...

 and the last thing I see on the console is the first printf.
 Screenshot (thanks to iLO) at
 http://www.waldner.priv.at/temp/killall5.jpg

So to me it seems like signal(SIGSTOP, SIG_IGN); either isn't 
 honored, and killall5 itself killed, or else it kills something else
 essential, but what could that be?

Plus, I've discovered 3 other boxen, various DL360/380, with the same 
 problem. Isn't there anyone else with Compaq/HP gear and this problem?

cheers,
rw
-- 
/ Ing. Robert Waldner | Security Engineer |  CoreTec IT-Security  \
\   [EMAIL PROTECTED]   | T +43 1 503 72 73 | F +43 1 503 72 73 x99 /




pgpCXpBZM0NVH.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-28 Thread Miquel van Smoorenburg
On 2004.12.28 17:42, Robert Waldner wrote:
On Tue, 28 Dec 2004 08:40:48 +0100, Robert Waldner writes:
Hmm, can it be that killall5 doesn't actually manage to *not* kill
 itself?
Ofcourse it goes through great lengths to do exactly that - NOT
kill itself. It kills all processes _except_ itself and its
caller.
Any hints on what it _could_ be, or on what I can do to further narrow
 down the problem?
Well, I expanded killall5.c with a couple printf's:
...
int main(int argc, char **argv)
{
...
signal(SIGTERM, SIG_IGN);
signal(SIGSTOP, SIG_IGN);
signal(SIGKILL, SIG_IGN);
/* Now stop all processes. */
// changes rw
printf(now doing kill(-1, SIGSTOP);\n);
kill(-1, SIGSTOP);
sent_sigstop = 1;
printf(done with kill(-1, SIGSTOP);\n);
...
 and the last thing I see on the console is the first printf.
 Screenshot (thanks to iLO) at
 http://www.waldner.priv.at/temp/killall5.jpg
So to me it seems like signal(SIGSTOP, SIG_IGN); either isn't 
 honored, and killall5 itself killed, or else it kills something else
 essential, but what could that be?
No, kill(-1, SIGWHATEVER) is guaranteed to kill all processes
/except/ the caller. man 2 kill on any unix/linux box. What kernel
are you using, this might be a kernel bug. Is this an i386 or
another architecture ?
(You're not running bootlogd somehow at shutdown time are you ?)
Plus, I've discovered 3 other boxen, various DL360/380, with the same 
 problem. Isn't there anyone else with Compaq/HP gear and this problem?
I doubt it is compaq specific, but there must be something else
out of the ordinary here or everybody would have this problem.
Mike.


Re: Sarge box not rebooting..

2004-12-28 Thread Robert Waldner

On Tue, 28 Dec 2004 17:02:16 GMT, Miquel van Smoorenburg writes:
 So to me it seems like signal(SIGSTOP, SIG_IGN); either isn't
  honored, and killall5 itself killed, or else it kills something else
  essential, but what could that be?

No, kill(-1, SIGWHATEVER) is guaranteed to kill all processes
/except/ the caller. man 2 kill on any unix/linux box. What kernel
are you using, this might be a kernel bug. Is this an i386 or
another architecture ?

2.4.27, from kernel-image-2.4.27-1-386, i386 arch, straight Sarge
 from d-i RC2, `apt-get upgrade` up-to-date as of now.

(You're not running bootlogd somehow at shutdown time are you ?)

Nope, only klogd is still running (I've put an `ps ax | grep log` 
 right before the first killall5 call into sendsigs).

 Plus, I've discovered 3 other boxen, various DL360/380, with the same
  problem. Isn't there anyone else with Compaq/HP gear and this problem?

I doubt it is compaq specific, but there must be something else
out of the ordinary here or everybody would have this problem.

If only I had any idea on what it could be :(

I've put more info (`dpkg -l`, `ps auxwww`, cpuinfo, meminfo, lsmod) at
 http://www.waldner.priv.at/temp/machine.txt (it'd make for one long 
 email otherwise).

cheers,
rgrateful for *any* hintw
-- 
/ Ing. Robert Waldner | Security Engineer |  CoreTec IT-Security  \
\   [EMAIL PROTECTED]   | T +43 1 503 72 73 | F +43 1 503 72 73 x99 /




pgp7l1c6LYEM9.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-27 Thread John Smith
On Mon, 2004-12-27 at 10:48 +0100, Robert Waldner wrote:
 I have a HP DL380 here with Sarge (current as of now) on it. Problem 

rant It must be one of the most mentioned boxes on these lists, a
distinction I wouldn't crave, as it's manufacturer... (Hey, HP, are you
listening? You are using Debian inhouse, why don't you contribute a bit
more, we are convinced of the hardware quality!) /rant

As it's not even displaying Rebooting, did you check
the /etc/init.d/reboot permissions? How about calling it directly with
a sh -x?

Sincerely,

Jan.

-- 
John Smith [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Sarge box not rebooting..

2004-12-27 Thread Robert Waldner

On Mon, 27 Dec 2004 12:25:08 +0100, John Smith writes:
On Mon, 2004-12-27 at 10:48 +0100, Robert Waldner wrote:
 I have a HP DL380 here with Sarge (current as of now) on it. Problem 

rant It must be one of the most mentioned boxes on these lists, a
distinction I wouldn't crave, as it's manufacturer... (Hey, HP, are you
listening? You are using Debian inhouse, why don't you contribute a bit
more, we are convinced of the hardware quality!) /rant

Tell me about it :( - unfortunately I've no control whatsoever about 
 the choice of hardware.

As it's not even displaying Rebooting, did you check
the /etc/init.d/reboot permissions? How about calling it directly with
a sh -x?

It doesn't come as far. I've checked further, and what's holding it up 
 is /etc/rc6.d/S20sendsigs, the `killall5 -15`. I strace'd it, and the 
 last thing I get is rt_sigaction(-1, SIGSTOP (note the missing )). 
 If I background both killall5's, it comes as far as Saving random 
 seed... done, eg S30urandom finishes.

Hmm, can it be that killall5 doesn't actually manage to *not* kill 
 itself? This would be a quite grave bug.

cheers,
rw
-- 
/ Ing. Robert Waldner | Security Engineer |  CoreTec IT-Security  \
\   [EMAIL PROTECTED]   | T +43 1 503 72 73 | F +43 1 503 72 73 x99 /




pgpKee55JF7XP.pgp
Description: PGP signature


Re: Sarge box not rebooting..

2004-12-27 Thread Miquel van Smoorenburg
In article [EMAIL PROTECTED],
Robert Waldner  [EMAIL PROTECTED] wrote:
On Mon, 27 Dec 2004 12:25:08 +0100, John Smith writes:
On Mon, 2004-12-27 at 10:48 +0100, Robert Waldner wrote:
 I have a HP DL380 here with Sarge (current as of now) on it. Problem 

rant It must be one of the most mentioned boxes on these lists, a
distinction I wouldn't crave, as it's manufacturer... (Hey, HP, are you
listening? You are using Debian inhouse, why don't you contribute a bit
more, we are convinced of the hardware quality!) /rant

Tell me about it :( - unfortunately I've no control whatsoever about 
 the choice of hardware.

As it's not even displaying Rebooting, did you check
the /etc/init.d/reboot permissions? How about calling it directly with
a sh -x?

It doesn't come as far. I've checked further, and what's holding it up 
 is /etc/rc6.d/S20sendsigs, the `killall5 -15`. I strace'd it, and the 
 last thing I get is rt_sigaction(-1, SIGSTOP (note the missing )). 

Ofcourse, by then the strace process is sigSTOPped too. Heisenbug.

 If I background both killall5's, it comes as far as Saving random 
 seed... done, eg S30urandom finishes.

Hmm, can it be that killall5 doesn't actually manage to *not* kill 
 itself?

Ofcourse it goes through great lengths to do exactly that - NOT
kill itself. It kills all processes _except_ itself and its
caller.

This would be a quite grave bug.

Ehm, no. Wrong conclusion. 

Mike.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]