Re: Unexpected shutdown

2007-11-23 Thread n j
Just to wrap this thread up, as I was out of town last couple of days:

 According to shutdown(8) there should be a message in the log stating
 when the system went down, who did it and why.

There should be... but there isn't :-). The only thing that went to
/var/log/messages is syslogd exiting on signal 15. Which suggests
that it wasn't just a power loss if the system had enough time to send
termination signal, but rather a software shutdown.

The cause of shutdown is and will remain mysterious. The machine is
working fine ever since the power on. Thanks to everyone who provided
ideas and suggestions. If I ever discover what really happened, I'll
let you know.

Regards,
-- 
Nino
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Unexpected shutdown

2007-11-19 Thread Roland Smith
On Mon, Nov 19, 2007 at 03:55:42AM +0100, n j wrote:
 Hello Randy, Roland, Gary,
 
  UPS drivers can shut the system down, but you seemed to have ruled
  that out?
 
 The UPS is present, but I never set up and configured anything (no
 snmp or any other agents) that would give the UPS the permission to
 shutdown the machine and besides there are more machines on the same
 UPS that continued to work just fine, so I guess that UPS is ruled
 out, yes.
 
  It could be triggered by the acpi_thermal driver. Check system
  temperatures with sysctl or mbmon.
 
 This is actually what I was looking for, even if it turns out it is
 not the solution: a pointer to a useful port plus pointer to reading
 the temperatures with sysctl. That kind of things makes the -questions
 an invaluable resource.

:-)

 That remark led me to discover the following:
 
 - kldstat shows acpi.ko loaded
 - sysctl has no acpi thermal variables whatsoever!

It depends on the mobo and the acpi tables if it works. It works on my
laptop but not on my destop for instance.

 which further led me to check for acpi thermal variables on another
 FreeBSD 6.2 (non-Dell) server and sure they were there. So it seems
 that acpi thermal is not working (is perhaps blacklisted, a term I
 noticed in the man page) on Dell Poweredge (in this case PE 1750 as
 well as PE 750) servers. Anyone can verify this?

Well, if it's not the ups nor a thermal overload, I guess the obvious
solution is that some joker gave a shutdown command with a 3am time. :-)

According to shutdown(8) there should be a message in the log stating
when the system went down, who did it and why.

Or maybe there is a script that calls shutdown under some circumstances?

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgp6k4JhetSVW.pgp
Description: PGP signature


Re: Unexpected shutdown

2007-11-18 Thread n j
 Does it happened before, or does it happened everyday at 3 am, or is this the 
 first time your box shutdown without explaination?

No, this is the first time this has occurred, that is what makes it
completely unexpected.

 If this is the first time, I would say there are many possibilities. Say an 
 accidental quick push on power button or - humor me - the cleaning lady is 
 with the conserve energy movement and thought your box just another 
 forgotten-to-shutdown desktop, that alone could explain your mysterious 
 shutdown incident.

The machine is located in a server room within a server rack with a
(detachable) panel on the front side of the machine (Dell Poweredge)
that is covering the power-off button. No cleaning lady is entering
the room, especially at 3 AM. Due to all the circumstances I had
described, I ruled out (physical) human factor as the cause of
shutdown.

The box has two independent AC power supplies, no hardware error is
found in RAC card logs, no other server (in the same rack/room) shut
down at that time. That is what leads me to believe that the problem
is software-related.

I know there are many possibilities out there, but I am pondering this
for the whole day and ruled out everything that came to mind. So, any
other ideas - even humorous - are welcome.

Thanks for the input in any case.

Regards,
-- 
Nino
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Unexpected shutdown

2007-11-18 Thread Randy Pratt
On Sun, 18 Nov 2007 22:12:34 +0100
n j [EMAIL PROTECTED] wrote:

  Does it happened before, or does it happened everyday at 3 am, or is this 
  the first time your box shutdown without explaination?
 
 No, this is the first time this has occurred, that is what makes it
 completely unexpected.
 
  If this is the first time, I would say there are many possibilities. Say an 
  accidental quick push on power button or - humor me - the cleaning lady is 
  with the conserve energy movement and thought your box just another 
  forgotten-to-shutdown desktop, that alone could explain your mysterious 
  shutdown incident.
 
 The machine is located in a server room within a server rack with a
 (detachable) panel on the front side of the machine (Dell Poweredge)
 that is covering the power-off button. No cleaning lady is entering
 the room, especially at 3 AM. Due to all the circumstances I had
 described, I ruled out (physical) human factor as the cause of
 shutdown.
 
 The box has two independent AC power supplies, no hardware error is
 found in RAC card logs, no other server (in the same rack/room) shut
 down at that time. That is what leads me to believe that the problem
 is software-related.
 
 I know there are many possibilities out there, but I am pondering this
 for the whole day and ruled out everything that came to mind. So, any
 other ideas - even humorous - are welcome.

A few months ago I started having random mysterious lockups, no
panics, no messages, no hints, no keyboard and no ssh.  It forced
me to recycle power to get the system back.

After playing the RAM swap game, updating sources, and other such
dead-ends, I felt the hard drives (Maxtor 7200RPM 250G type) and
they were quite warm.  I did a little hardware re-arranging so that the
hard drives got more air and I've not had a lockup since. I had also
been monitoring the temperature but didn't see any indication that it
was the CPU or motherboard components.

This is all ancedotal since I don't have any hard evidence to point
to exactly one thing since I also swapped out a fan and reinserted
connectors in the process.  My feeling is that it was hard
drive heat-related so my suggestion is to do some poking around for hot
spots, clogged fan filters and any other factors affecting temperatures.

In any case, in the grand scheme of things, *all* hardware will
fail ... eventually ;-)

Randy
-- 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Unexpected shutdown

2007-11-18 Thread Roland Smith
On Sun, Nov 18, 2007 at 10:12:34PM +0100, n j wrote:
 I know there are many possibilities out there, but I am pondering this
 for the whole day and ruled out everything that came to mind. So, any
 other ideas - even humorous - are welcome.

Since it was a regular shutdown as opposed to a panic, something must
have triggered that shutdown.

UPS drivers can shut the system down, but you seemed to have ruled
that out?

It could be triggered by the acpi_thermal driver. Check system
temperatures with sysctl or mbmon.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpJlRpgjnpZG.pgp
Description: PGP signature


Re: Unexpected shutdown

2007-11-18 Thread Gary Kline
On Sun, Nov 18, 2007 at 11:58:49PM +0100, Roland Smith wrote:
 On Sun, Nov 18, 2007 at 10:12:34PM +0100, n j wrote:
  I know there are many possibilities out there, but I am pondering this
  for the whole day and ruled out everything that came to mind. So, any
  other ideas - even humorous - are welcome.
 
 Since it was a regular shutdown as opposed to a panic, something must
 have triggered that shutdown.
 
 UPS drivers can shut the system down, but you seemed to have ruled
 that out?
 
 It could be triggered by the acpi_thermal driver. Check system
 temperatures with sysctl or mbmon.
 
 Roland


If the system both shutdown *and* rebooted, I had  the same 
inexplicable thing happen to me many times.  It began happening
to my Dell 8200 (hmm?) say, three months ago, and I 
believe I solved the problem about 6 weeks ago.  

There was some unknown fs fault in my /var slice.  Just by sheer
chance, I watched my server abruptly powered down when something 
[maybe] tried to write to /var/* and failed.   At first I
thought it was bad memory; then, just-maybe, a bad drive.
(The drive is new, and 512MB of the DDR is also new.)  I
also thought it was a heat problem, and that I needed another
fan.   ... .

Long story short, I  saved /var /somewhere, then found
something I couldn't remove. chflags did no good.   Finally
I did a /bin/rm -rf /var.  After I added it back, newfs'd it,
and copied back the stuff, no-more-spontaneous-and-random
reboots.

gary

PS: it was fsck that couldn't fix the bad spot. The fault was
related to an inode allocation snafu.  but i've  never 
hacked any fs code, so   



 -- 
 R.F.Smith   http://www.xs4all.nl/~rsmith/
 [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
 pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)



-- 
  Gary Kline  [EMAIL PROTECTED]   www.thought.org  Public Service Unix
  http://jottings.thought.org   http://transfinite.thought.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Unexpected shutdown

2007-11-18 Thread n j
Hello Randy, Roland, Gary,

 This is all ancedotal since I don't have any hard evidence to point
 to exactly one thing since I also swapped out a fan and reinserted
 connectors in the process.  My feeling is that it was hard
 drive heat-related so my suggestion is to do some poking around for hot
 spots, clogged fan filters and any other factors affecting temperatures.

I guess it is possible, if not even likely, that the shutdown was
temperature-related. I'll investigate the fan filters and clean some
dust if they're clogged. The fact that the machine has very small load
(0.1 - 0.2) most of the time and that the disk activity at the time of
shutdown was not intensive leads me to believe that this isn't the
case, but who knows?

 UPS drivers can shut the system down, but you seemed to have ruled
 that out?

The UPS is present, but I never set up and configured anything (no
snmp or any other agents) that would give the UPS the permission to
shutdown the machine and besides there are more machines on the same
UPS that continued to work just fine, so I guess that UPS is ruled
out, yes.

 It could be triggered by the acpi_thermal driver. Check system
 temperatures with sysctl or mbmon.

This is actually what I was looking for, even if it turns out it is
not the solution: a pointer to a useful port plus pointer to reading
the temperatures with sysctl. That kind of things makes the -questions
an invaluable resource.

That remark led me to discover the following:

- kldstat shows acpi.ko loaded
- sysctl has no acpi thermal variables whatsoever!

which further led me to check for acpi thermal variables on another
FreeBSD 6.2 (non-Dell) server and sure they were there. So it seems
that acpi thermal is not working (is perhaps blacklisted, a term I
noticed in the man page) on Dell Poweredge (in this case PE 1750 as
well as PE 750) servers. Anyone can verify this?

 If the system both shutdown *and* rebooted, I had the same inexplicable thing 
 happen to me many times.

Actually, a small correction - the server shut down and stayed that
way until I turned it back on a couple of hours later. After that, the
server booted just fine and is up right now. It even survived 3 AM
tonight without shutting down.

Regards,
-- 
Nino

Fact of life: intermittent bugs are hardest to debug.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]