Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-08 Thread Steven Lembark

 I agree that your script is nice and simple, and hence less prone to
 errors.  I coded mine in c++ because I use it not only for a machine
 type watchdog, but also a task based watchdog that reboots the machine
 based on certain tasks living or not.  Each task has to register with
 the watchdog server and continually tell the server they're alive, or
 reboot!  But that's a story for another thread...

#!/path/to/perl

use strict;

use Sys::Syslog;

open my $fh, '', '/dev/watchdog'
or die /dev/watchdog: $!;

# if any of these go away we need to notice it.
# ok... you'll notice the first one anyway.

my @watchz
= qw
(
init
ntpd
apache
/opt/sybase/ASE-12_5/bin/dataserver
);

# wd timeout / 2, or 1 for minimum sleep
# (avoid usleep: too much overhead).

my $cycle   = 15;

# get the syslog handle

openlog blah blah blah
or die 'Et tu, syslog?';

CYCLE:
for(;;)
{
sleep ( $cycle - ( time % $cycle ) );

# split and args vary by O/S, this works on linux.

my @procz   = map { split /\s+/, $_, 6 )[5] } qx( ps a );

my %chechz  = ();

@chechz{ @watchz }  = ();

delete @chechz{ @procz };

if( %chechz )
{
# oops, current proc's don't include the
# list of processes being watched.
#
# this can happen twice in a w/d interval
# before the system goes down.

my $nastygram
= join \t, 'Missing proc's:', join \t, keys %chechz

syslog LOG_CRIT | LOG_FOO, $nastygram;

next CYCLE

# alternative here is to close $fh here and
# bounce the system immediately, the
# approach of looping allows an
# intentional restart of the service
# (in less than 1 w/d cycle) w/o bouncing the box.
}

# if the proc check got this far then the w/d
# file gets poked and we live for another loop.

print $wd \n;
}

# this isn't a module

0

__END__

-- 
Steven Lembark85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
[EMAIL PROTECTED]  +1 888 359 3508
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-07 Thread Steven Lembark
Iain Buchanan wrote:
 On Sat, 2008-04-05 at 12:42 -0400, Steven Lembark wrote:
 I tried ALT + SysRq + EISUB today on my MythTV backend server which
 has been crashing lately. Unfortunately it's crashing so badly that
 even at the server's keyboard this didn't work.

 I guess my weekend fate of building a new server is sealed...
 Have fun.

 Check out motherboards with watchdog capability
 and enable it in the kernel.

 watchdogs are nice, and linux makes them ultra-easy to program, but of
 course if your watchdog task dies, then the machine effectively hits the
 reset button for you - no nice shutdown whatsoever!  (Which is what you
 want in a hard lock-up, but not if your programming skills are the cause
 of the problem :)

- Have the system turn off the watchdog if the file is
  closed.

- After that just open it and poke a bit out now and
  then.

- Make a point of closing the file on exit.

#!/usr/bin/perl

use strict;

open my $fh, '', '/path/to/watchdog/file'
or die Failed opening watchdog file: $!;

# watchdog is now watching...

select $fh;

for(;;)
{
print \n;

sleep 1;# watchdog timeout / 2
}

my $graceful_exit
= sub
{
close $fh;

exit 0
};

for sig in ( qw( TERM QUIT INT __DIE__ ) )
{
$SIG{ $sig }= $graceful_exit;
}

for sig in ( qw( HUP ) )
{
$SIG{ $SIG }= 'IGNORE';
}

__END__

-- 
Steven Lembark85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
[EMAIL PROTECTED]  +1 888 359 3508
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-07 Thread Iain Buchanan
On Mon, 2008-04-07 at 13:28 -0400, Steven Lembark wrote:
 Iain Buchanan wrote:

  watchdogs are nice, and linux makes them ultra-easy to program, but of
  course if your watchdog task dies, then the machine effectively hits the
  reset button for you - no nice shutdown whatsoever!  (Which is what you
  want in a hard lock-up, but not if your programming skills are the cause
  of the problem :)
 
 - Have the system turn off the watchdog if the file is
   closed.

maybe, maybe not :)  I personally like setting CONFIG_WATCHDOG_NOWAYOUT
on systems with hardware watchdogs, especially remote unattended
systems.  Usually your watchdog task never dies on such a system, and
when it does (be it from a nice kill or not) you want the watchdog to
fire.  However if this is a semi-used system (you ssh or log-in to it in
any way to do stuff) you may not want this.

 - After that just open it and poke a bit out now and
   then.
 
 - Make a point of closing the file on exit.
 
 #!/usr/bin/perl
 
 use strict;
 
 open my $fh, '', '/path/to/watchdog/file'

it's usually /dev/watchdog if you're using the linux kernel interface.

I agree that your script is nice and simple, and hence less prone to
errors.  I coded mine in c++ because I use it not only for a machine
type watchdog, but also a task based watchdog that reboots the machine
based on certain tasks living or not.  Each task has to register with
the watchdog server and continually tell the server they're alive, or
reboot!  But that's a story for another thread...
-- 
Iain Buchanan iaindb at netspace dot net dot au

Linux - Where do you want to fly today?
-- Unknown source

-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-06 Thread Iain Buchanan
On Sat, 2008-04-05 at 12:42 -0400, Steven Lembark wrote:
  I tried ALT + SysRq + EISUB today on my MythTV backend server which
  has been crashing lately. Unfortunately it's crashing so badly that
  even at the server's keyboard this didn't work.
 
  I guess my weekend fate of building a new server is sealed...
 
 Have fun.
 
 Check out motherboards with watchdog capability
 and enable it in the kernel.

watchdogs are nice, and linux makes them ultra-easy to program, but of
course if your watchdog task dies, then the machine effectively hits the
reset button for you - no nice shutdown whatsoever!  (Which is what you
want in a hard lock-up, but not if your programming skills are the cause
of the problem :)

cya,
-- 
Iain Buchanan iaindb at netspace dot net dot au

Evil is that which one believes of others.  It is a sin to believe evil
of others, but it is seldom a mistake.
-- H.L. Mencken

-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-05 Thread Steven Lembark

 I tried ALT + SysRq + EISUB today on my MythTV backend server which
 has been crashing lately. Unfortunately it's crashing so badly that
 even at the server's keyboard this didn't work.

 I guess my weekend fate of building a new server is sealed...

Have fun.

Check out motherboards with watchdog capability
and enable it in the kernel.

-- 
Steven Lembark85-09 90th St.
Workhorse Computing Woodhaven, NY, 11421
[EMAIL PROTECTED]  +1 888 359 3508
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-04 Thread Mark Knecht
On Wed, Apr 2, 2008 at 10:48 AM, Neil Bothwick [EMAIL PROTECTED] wrote:
 On Wed, 02 Apr 2008 19:40:37 +0200, Michael Schmarck wrote:

Neil even proposed ALT +
SysRq + EISUB, to be sure everything is killed, sync'd and
unmounted.
  
   Which might or might not work. But note that I was also talking
   about applications being in a corrupted state (the database example).

  E sends a SIGTERM to all applications. Any well behaved application
  should shut down cleanly on this. I sends a SIGKILL, but it only affects
  programs that were so locked up they ignored E, so you have nothing to
  lose by then.


I tried ALT + SysRq + EISUB today on my MythTV backend server which
has been crashing lately. Unfortunately it's crashing so badly that
even at the server's keyboard this didn't work.

I guess my weekend fate of building a new server is sealed...

Cheers,
Mark
-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-04 Thread Neil Bothwick
On Fri, 4 Apr 2008 14:05:42 -0700, Mark Knecht wrote:

 I tried ALT + SysRq + EISUB today on my MythTV backend server which
 has been crashing lately. Unfortunately it's crashing so badly that
 even at the server's keyboard this didn't work.

Do you have CONFIG_MAGIC_SYSRQ=y in your kernel config?


-- 
Neil Bothwick

A woman walked into a bar and asked the barman for a large double
entendre, so he gave her one.


signature.asc
Description: PGP signature


Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-04 Thread Mark Knecht
On Fri, Apr 4, 2008 at 2:50 PM, Neil Bothwick [EMAIL PROTECTED] wrote:
 On Fri, 4 Apr 2008 14:05:42 -0700, Mark Knecht wrote:

   I tried ALT + SysRq + EISUB today on my MythTV backend server which
   has been crashing lately. Unfortunately it's crashing so badly that
   even at the server's keyboard this didn't work.

  Do you have CONFIG_MAGIC_SYSRQ=y in your kernel config?


  --
  Neil Bothwick

No Neil, it turns out on that one machine is isn't set. Thanks. I'll
make sure it's set on the new server.

Cheers,
Mark
-- 
gentoo-user@lists.gentoo.org mailing list



[gentoo-user] Re: Emergency shutdown, how to?

2008-04-02 Thread Michael Schmarck
Liviu Andronic [EMAIL PROTECTED] wrote:

 Are there any potential harms to the hardware / system in case one
 tends to abuse (i.e. use more often than necessary) of this command?

You're not shutting down the system in a clean way. Because of
this, filesystem and/or applications might get corrupt (eg. think
of a database, which was in the middle of writing to some of
its tables).

 It's so often so tempting to shut down your system fast.

Yeah, it sure is :)

Michael

-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-02 Thread Dirk Heinrichs
Am Mittwoch, 2. April 2008 schrieb ext Michael Schmarck:

 You're not shutting down the system in a clean way.

You're not? I thought that's the purpose of the whole thing?

Bye...

Dirk
-- 
Dirk Heinrichs  | Tel:  +49 (0)162 234 3408
Configuration Manager   | Fax:  +49 (0)211 47068 111
Capgemini Deutschland   | Mail: [EMAIL PROTECTED]
Wanheimerstraße 68  | Web:  http://www.capgemini.com
D-40468 Düsseldorf  | ICQ#: 110037733
GPG Public Key C2E467BB | Keyserver: www.keyserver.net


signature.asc
Description: This is a digitally signed message part.


[gentoo-user] Re: Emergency shutdown, how to?

2008-04-02 Thread Michael Schmarck
· Dirk Heinrichs [EMAIL PROTECTED]:

 Am Mittwoch, 2. April 2008 schrieb ext Michael Schmarck:
 Dirk Heinrichs [EMAIL PROTECTED] wrote:
  Am Mittwoch, 2. April 2008 schrieb ext Michael Schmarck:
  You're not shutting down the system in a clean way.
 
  You're not? I thought that's the purpose of the whole thing?

 It's more like pulling the plug, isn't it? At least none of
 the shutdown scripts is run.  And if you don't run ALT + SysRq + U,
 or if it just doesn't work (like hangs at some (remote) fs),
 
 But nobody proposed _not_ to run ALT + SysRq + U,

True, but if things come to worse, you've got to do a ALT+SysRq+B
or +O, even before +U completely returned. As said, it can happen,
that U(nmount) doesn't work - and then you'd need to shutdown
anyway.

 Neil even proposed ALT +  
 SysRq + EISUB, to be sure everything is killed, sync'd and unmounted.

Which might or might not work. But note that I was also talking
about applications being in a corrupted state (the database example).

 filesystems aren't even unmounted and thus dirty and thus need
 a fsck run on next boot.
 
 XFS to the rescue :-)

Yep. Well, to be honest, I haven't had a fs die on me, because
of a Alt+SysRq+B.

Michael Schmarck
-- 
Inspiration without perspiration is usually sterile.


-- 
gentoo-user@lists.gentoo.org mailing list



Re: [gentoo-user] Re: Emergency shutdown, how to?

2008-04-02 Thread Neil Bothwick
On Wed, 02 Apr 2008 19:40:37 +0200, Michael Schmarck wrote:

  Neil even proposed ALT +  
  SysRq + EISUB, to be sure everything is killed, sync'd and
  unmounted.  
 
 Which might or might not work. But note that I was also talking
 about applications being in a corrupted state (the database example).

E sends a SIGTERM to all applications. Any well behaved application
should shut down cleanly on this. I sends a SIGKILL, but it only affects
programs that were so locked up they ignored E, so you have nothing to
lose by then.


-- 
Neil Bothwick

Weird enough for government work.


signature.asc
Description: PGP signature