[gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Harry Putnam
I've been experiencing spontaneous reboots on one gentoo machine
lately.  Looking thru /var/log/messages... I see the restarts but
looking above that... I'm not seeing anything I recognize as being a
culprit. 

Its been happening for a few weeks... but I've been busy and only now
digging into it ( The machine is no kind of server ).

It appears to only happen in X (I'm using xfce4) and I've only noticed
it since I started running 2.6.28 kernels.  Although I couldn't say
that it seemed to be directly related.

I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
rebooting. 

It does not appear to be heat realated... but I am only now using
lm_sensors to keep an accurate record and see if there appears to be a
relationship. 

I've had two today so either its happening more often or I'm just
spending more time on that machine.

It may also be on the first or second time its happened while I as
actually right at the keyboard.

I'm sorry to be so vague about it, but in truth, I've been pretty lazy
about it... since no real harm comes of an unexpected reboot on that
machine (so far anyway).  But clearly something that has to be figured
out. 

The only things I've checked so far... 
1) browsing thru /var/log/messages (Having trouble recognizing any
   thing that looks suspicious.

   I have noticed what appears to be a time/date anomaly where the
   progression of time is suddenly irregular.  That is, an earlier
   time shows up amongst some later times.

   It appears to have been me sudoing to visudo.  And apparently
   having /etc/sudoers open long enough for the closing of it to be
   earlier than other events taking place.
   
   Again ... I'm not real sure exactly what happened there but it
   does not appear to coincide with a reboot anyway.
 
2) checking how hot the cpu is getting (Doesn't appear to be a
   problem) But now running a cron job recording temperatures every 10
   minutes. So that may turn up something.

3) checking for overfilled disks.  (none show in df -h)




Re: [gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Mark Knecht
On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam rea...@newsguy.com wrote:
 I've been experiencing spontaneous reboots on one gentoo machine
 lately.  Looking thru /var/log/messages... I see the restarts but
 looking above that... I'm not seeing anything I recognize as being a
 culprit.

 Its been happening for a few weeks... but I've been busy and only now
 digging into it ( The machine is no kind of server ).

 It appears to only happen in X (I'm using xfce4) and I've only noticed
 it since I started running 2.6.28 kernels.  Although I couldn't say
 that it seemed to be directly related.

 I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
 rebooting.

 It does not appear to be heat realated... but I am only now using
 lm_sensors to keep an accurate record and see if there appears to be a
 relationship.

 I've had two today so either its happening more often or I'm just
 spending more time on that machine.

 It may also be on the first or second time its happened while I as
 actually right at the keyboard.

 I'm sorry to be so vague about it, but in truth, I've been pretty lazy
 about it... since no real harm comes of an unexpected reboot on that
 machine (so far anyway).  But clearly something that has to be figured
 out.

 The only things I've checked so far...
 1) browsing thru /var/log/messages (Having trouble recognizing any
   thing that looks suspicious.

   I have noticed what appears to be a time/date anomaly where the
   progression of time is suddenly irregular.  That is, an earlier
   time shows up amongst some later times.

   It appears to have been me sudoing to visudo.  And apparently
   having /etc/sudoers open long enough for the closing of it to be
   earlier than other events taking place.

   Again ... I'm not real sure exactly what happened there but it
   does not appear to coincide with a reboot anyway.

 2) checking how hot the cpu is getting (Doesn't appear to be a
   problem) But now running a cron job recording temperatures every 10
   minutes. So that may turn up something.

 3) checking for overfilled disks.  (none show in df -h)


Reseat memory and PCI cards, etc. Consider removing for a period of
time any hardware not absolutely necessary to debug the problem. (I.e.
- second video card, extra disk drives, extra network adapters, etc.)
Run memtest86 for a few days if you can spare the machine. Run
spinrite, etc., to look for drive problems. Open the box up and place
a fan blowing extra air for additional cooling.

good luck,
Mark



Re: [gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Volker Armin Hemmann
So the problem started recently.

That means it is either:
a cap going bad.
oxidized contacts.
dust clogging the fans.
PSU is going bad.
something obscure.

Do the easy thing first. Clean your case, reseat all cards and memory modules 
and check all caps while doing so. Any of them deformed? The 'head' going up? 
Strange stuff around its feet? Congratulation, you need new hardware.

If you don't find a bad cap and the problem persists, get a new PSU. A good 
one. Not big - most PSUs are oversized, but good quality. Anandtech has 
something about psu's, so does tomshardware (most of their tests are rubbish, 
but their psu tests are ok). If the problem goes away, congratulation!
If not, well, then report back ;)



Re: [gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Saphirus Sage
On Feb 15, 2009, at 7:16 PM, Volker Armin Hemmann volkerar...@googlemail.com 
 wrote:



So the problem started recently.

That means it is either:
a cap going bad.
oxidized contacts.
dust clogging the fans.
PSU is going bad.
something obscure.

Do the easy thing first. Clean your case, reseat all cards and  
memory modules
and check all caps while doing so. Any of them deformed? The 'head'  
going up?

Strange stuff around its feet? Congratulation, you need new hardware.

If you don't find a bad cap and the problem persists, get a new PSU.  
A good
one. Not big - most PSUs are oversized, but good quality. Anandtech  
has
something about psu's, so does tomshardware (most of their tests are  
rubbish,

but their psu tests are ok). If the problem goes away, congratulation!
If not, well, then report back ;)

I had a similar issue even when not running X. To be honest, I can't  
say I have a concrete idea of exactly what caused it. I simply became  
security-nuts and began wondering if it wasn't someone just toying  
with me; hardened my sshd config and installed denyhosts to monitor  
failed loggins. This was a month ago and my uptime has been perfect,  
with no restarts. 



Re: [gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Dale
Mark Knecht wrote:
 On Sun, Feb 15, 2009 at 3:42 PM, Harry Putnam rea...@newsguy.com wrote:
   
 I've been experiencing spontaneous reboots on one gentoo machine
 lately.  Looking thru /var/log/messages... I see the restarts but
 looking above that... I'm not seeing anything I recognize as being a
 culprit.

 Its been happening for a few weeks... but I've been busy and only now
 digging into it ( The machine is no kind of server ).

 It appears to only happen in X (I'm using xfce4) and I've only noticed
 it since I started running 2.6.28 kernels.  Although I couldn't say
 that it seemed to be directly related.

 I mean I didn't boot into 2.6.28 and suddenly notice spontaneous
 rebooting.

 It does not appear to be heat realated... but I am only now using
 lm_sensors to keep an accurate record and see if there appears to be a
 relationship.

 I've had two today so either its happening more often or I'm just
 spending more time on that machine.

 It may also be on the first or second time its happened while I as
 actually right at the keyboard.

 I'm sorry to be so vague about it, but in truth, I've been pretty lazy
 about it... since no real harm comes of an unexpected reboot on that
 machine (so far anyway).  But clearly something that has to be figured
 out.

 The only things I've checked so far...
 1) browsing thru /var/log/messages (Having trouble recognizing any
   thing that looks suspicious.

   I have noticed what appears to be a time/date anomaly where the
   progression of time is suddenly irregular.  That is, an earlier
   time shows up amongst some later times.

   It appears to have been me sudoing to visudo.  And apparently
   having /etc/sudoers open long enough for the closing of it to be
   earlier than other events taking place.

   Again ... I'm not real sure exactly what happened there but it
   does not appear to coincide with a reboot anyway.

 2) checking how hot the cpu is getting (Doesn't appear to be a
   problem) But now running a cron job recording temperatures every 10
   minutes. So that may turn up something.

 3) checking for overfilled disks.  (none show in df -h)

 

 Reseat memory and PCI cards, etc. Consider removing for a period of
 time any hardware not absolutely necessary to debug the problem. (I.e.
 - second video card, extra disk drives, extra network adapters, etc.)
 Run memtest86 for a few days if you can spare the machine. Run
 spinrite, etc., to look for drive problems. Open the box up and place
 a fan blowing extra air for additional cooling.

 good luck,
 Mark


   

To add another test.  I had this issue once before and it was a faulty
driver for my hard drives.  I ran a command like this to test mine:

hdparm -Tt /dev/hda  hdparm -Tt /dev/hda  hdparm -Tt /dev/hda 
hdparm -Tt /dev/hda  hdparm -Tt /dev/hda

If it can pass that then it should be all right and you can look
elsewhere.  Mine would only fail when the drives were very busy and that
test should do that pretty good.

Hope that helps.

Dale

:-)  :-) 



Re: [gentoo-user] spontaneous reboots.. what to look for

2009-02-15 Thread Neil Bothwick
On Sun, 15 Feb 2009 17:42:44 -0600, Harry Putnam wrote:

 2) checking how hot the cpu is getting (Doesn't appear to be a
problem) But now running a cron job recording temperatures every 10
minutes. So that may turn up something.

You could also check disk temperatures with app-admin/hddtemp. I've had
random crashes due to an overheating drive before. I'd also run smartctl
(emerge smartmontools) over the drive, just to be sure.

memtest is a must, bad RAM can easily cause crashes, and take Volker's
advice on PSUs.


-- 
Neil Bothwick

What if there were no hypothetical situations?


signature.asc
Description: PGP signature