Re: Unexpected shutdown
Just to wrap this thread up, as I was out of town last couple of days: According to shutdown(8) there should be a message in the log stating when the system went down, who did it and why. There should be... but there isn't :-). The only thing that went to /var/log/messages is syslogd exiting on signal 15. Which suggests that it wasn't just a power loss if the system had enough time to send termination signal, but rather a software shutdown. The cause of shutdown is and will remain mysterious. The machine is working fine ever since the power on. Thanks to everyone who provided ideas and suggestions. If I ever discover what really happened, I'll let you know. Regards, -- Nino ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unexpected shutdown
On Mon, Nov 19, 2007 at 03:55:42AM +0100, n j wrote: Hello Randy, Roland, Gary, UPS drivers can shut the system down, but you seemed to have ruled that out? The UPS is present, but I never set up and configured anything (no snmp or any other agents) that would give the UPS the permission to shutdown the machine and besides there are more machines on the same UPS that continued to work just fine, so I guess that UPS is ruled out, yes. It could be triggered by the acpi_thermal driver. Check system temperatures with sysctl or mbmon. This is actually what I was looking for, even if it turns out it is not the solution: a pointer to a useful port plus pointer to reading the temperatures with sysctl. That kind of things makes the -questions an invaluable resource. :-) That remark led me to discover the following: - kldstat shows acpi.ko loaded - sysctl has no acpi thermal variables whatsoever! It depends on the mobo and the acpi tables if it works. It works on my laptop but not on my destop for instance. which further led me to check for acpi thermal variables on another FreeBSD 6.2 (non-Dell) server and sure they were there. So it seems that acpi thermal is not working (is perhaps blacklisted, a term I noticed in the man page) on Dell Poweredge (in this case PE 1750 as well as PE 750) servers. Anyone can verify this? Well, if it's not the ups nor a thermal overload, I guess the obvious solution is that some joker gave a shutdown command with a 3am time. :-) According to shutdown(8) there should be a message in the log stating when the system went down, who did it and why. Or maybe there is a script that calls shutdown under some circumstances? Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgp6k4JhetSVW.pgp Description: PGP signature
Re: Unexpected shutdown
Does it happened before, or does it happened everyday at 3 am, or is this the first time your box shutdown without explaination? No, this is the first time this has occurred, that is what makes it completely unexpected. If this is the first time, I would say there are many possibilities. Say an accidental quick push on power button or - humor me - the cleaning lady is with the conserve energy movement and thought your box just another forgotten-to-shutdown desktop, that alone could explain your mysterious shutdown incident. The machine is located in a server room within a server rack with a (detachable) panel on the front side of the machine (Dell Poweredge) that is covering the power-off button. No cleaning lady is entering the room, especially at 3 AM. Due to all the circumstances I had described, I ruled out (physical) human factor as the cause of shutdown. The box has two independent AC power supplies, no hardware error is found in RAC card logs, no other server (in the same rack/room) shut down at that time. That is what leads me to believe that the problem is software-related. I know there are many possibilities out there, but I am pondering this for the whole day and ruled out everything that came to mind. So, any other ideas - even humorous - are welcome. Thanks for the input in any case. Regards, -- Nino ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unexpected shutdown
On Sun, 18 Nov 2007 22:12:34 +0100 n j [EMAIL PROTECTED] wrote: Does it happened before, or does it happened everyday at 3 am, or is this the first time your box shutdown without explaination? No, this is the first time this has occurred, that is what makes it completely unexpected. If this is the first time, I would say there are many possibilities. Say an accidental quick push on power button or - humor me - the cleaning lady is with the conserve energy movement and thought your box just another forgotten-to-shutdown desktop, that alone could explain your mysterious shutdown incident. The machine is located in a server room within a server rack with a (detachable) panel on the front side of the machine (Dell Poweredge) that is covering the power-off button. No cleaning lady is entering the room, especially at 3 AM. Due to all the circumstances I had described, I ruled out (physical) human factor as the cause of shutdown. The box has two independent AC power supplies, no hardware error is found in RAC card logs, no other server (in the same rack/room) shut down at that time. That is what leads me to believe that the problem is software-related. I know there are many possibilities out there, but I am pondering this for the whole day and ruled out everything that came to mind. So, any other ideas - even humorous - are welcome. A few months ago I started having random mysterious lockups, no panics, no messages, no hints, no keyboard and no ssh. It forced me to recycle power to get the system back. After playing the RAM swap game, updating sources, and other such dead-ends, I felt the hard drives (Maxtor 7200RPM 250G type) and they were quite warm. I did a little hardware re-arranging so that the hard drives got more air and I've not had a lockup since. I had also been monitoring the temperature but didn't see any indication that it was the CPU or motherboard components. This is all ancedotal since I don't have any hard evidence to point to exactly one thing since I also swapped out a fan and reinserted connectors in the process. My feeling is that it was hard drive heat-related so my suggestion is to do some poking around for hot spots, clogged fan filters and any other factors affecting temperatures. In any case, in the grand scheme of things, *all* hardware will fail ... eventually ;-) Randy -- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unexpected shutdown
On Sun, Nov 18, 2007 at 10:12:34PM +0100, n j wrote: I know there are many possibilities out there, but I am pondering this for the whole day and ruled out everything that came to mind. So, any other ideas - even humorous - are welcome. Since it was a regular shutdown as opposed to a panic, something must have triggered that shutdown. UPS drivers can shut the system down, but you seemed to have ruled that out? It could be triggered by the acpi_thermal driver. Check system temperatures with sysctl or mbmon. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpJlRpgjnpZG.pgp Description: PGP signature
Re: Unexpected shutdown
On Sun, Nov 18, 2007 at 11:58:49PM +0100, Roland Smith wrote: On Sun, Nov 18, 2007 at 10:12:34PM +0100, n j wrote: I know there are many possibilities out there, but I am pondering this for the whole day and ruled out everything that came to mind. So, any other ideas - even humorous - are welcome. Since it was a regular shutdown as opposed to a panic, something must have triggered that shutdown. UPS drivers can shut the system down, but you seemed to have ruled that out? It could be triggered by the acpi_thermal driver. Check system temperatures with sysctl or mbmon. Roland If the system both shutdown *and* rebooted, I had the same inexplicable thing happen to me many times. It began happening to my Dell 8200 (hmm?) say, three months ago, and I believe I solved the problem about 6 weeks ago. There was some unknown fs fault in my /var slice. Just by sheer chance, I watched my server abruptly powered down when something [maybe] tried to write to /var/* and failed. At first I thought it was bad memory; then, just-maybe, a bad drive. (The drive is new, and 512MB of the DDR is also new.) I also thought it was a heat problem, and that I needed another fan. ... . Long story short, I saved /var /somewhere, then found something I couldn't remove. chflags did no good. Finally I did a /bin/rm -rf /var. After I added it back, newfs'd it, and copied back the stuff, no-more-spontaneous-and-random reboots. gary PS: it was fsck that couldn't fix the bad spot. The fault was related to an inode allocation snafu. but i've never hacked any fs code, so -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -- Gary Kline [EMAIL PROTECTED] www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unexpected shutdown
Hello Randy, Roland, Gary, This is all ancedotal since I don't have any hard evidence to point to exactly one thing since I also swapped out a fan and reinserted connectors in the process. My feeling is that it was hard drive heat-related so my suggestion is to do some poking around for hot spots, clogged fan filters and any other factors affecting temperatures. I guess it is possible, if not even likely, that the shutdown was temperature-related. I'll investigate the fan filters and clean some dust if they're clogged. The fact that the machine has very small load (0.1 - 0.2) most of the time and that the disk activity at the time of shutdown was not intensive leads me to believe that this isn't the case, but who knows? UPS drivers can shut the system down, but you seemed to have ruled that out? The UPS is present, but I never set up and configured anything (no snmp or any other agents) that would give the UPS the permission to shutdown the machine and besides there are more machines on the same UPS that continued to work just fine, so I guess that UPS is ruled out, yes. It could be triggered by the acpi_thermal driver. Check system temperatures with sysctl or mbmon. This is actually what I was looking for, even if it turns out it is not the solution: a pointer to a useful port plus pointer to reading the temperatures with sysctl. That kind of things makes the -questions an invaluable resource. That remark led me to discover the following: - kldstat shows acpi.ko loaded - sysctl has no acpi thermal variables whatsoever! which further led me to check for acpi thermal variables on another FreeBSD 6.2 (non-Dell) server and sure they were there. So it seems that acpi thermal is not working (is perhaps blacklisted, a term I noticed in the man page) on Dell Poweredge (in this case PE 1750 as well as PE 750) servers. Anyone can verify this? If the system both shutdown *and* rebooted, I had the same inexplicable thing happen to me many times. Actually, a small correction - the server shut down and stayed that way until I turned it back on a couple of hours later. After that, the server booted just fine and is up right now. It even survived 3 AM tonight without shutting down. Regards, -- Nino Fact of life: intermittent bugs are hardest to debug. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]