Re: [gentoo-user] Computer turn itself off

2015-05-23 Thread Zhu Sha Zang

On 05/23/2015 05:24 PM, Joseph wrote:
I have a box in a remote location (8-core CPU) and it turn itself off 
during compiling


The box it connected to UPS.  Is it power supply?



Maybe. I have a problem like that when using high processing simulation 
with nvidia-cuda and the power supply protection was unable to keep a 
safe energy level then the system goes off.


But, if the failure happens during compilation time can be a heat 
problem. Install lm_sensors and use something like that: watch -n 1 
sensors.


If not, if the temperature stay at safe levels, maybe you have a RAM 
corruption. In this case, you'll need to use memtest86++ to check.


Good Luck



Re: [gentoo-user] Computer turn itself off

2015-05-23 Thread Joseph

On 05/23/15 18:08, Zhu Sha Zang wrote:

On 05/23/2015 05:24 PM, Joseph wrote:

I have a box in a remote location (8-core CPU) and it turn itself off
during compiling

The box it connected to UPS.  Is it power supply?



Maybe. I have a problem like that when using high processing simulation
with nvidia-cuda and the power supply protection was unable to keep a
safe energy level then the system goes off.

But, if the failure happens during compilation time can be a heat
problem. Install lm_sensors and use something like that: watch -n 1
sensors.

If not, if the temperature stay at safe levels, maybe you have a RAM
corruption. In this case, you'll need to use memtest86++ to check.

Good Luck


Thank you for the feedback, checking the sensors there is what I get:

fan1:   0 RPM  (min =   10 RPM)  ALARM
fan2:   0 RPM  (min =0 RPM)
fan3:   0 RPM  (min =0 RPM)
fan5:   0 RPM  (min =0 RPM)
temp1:+45.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:+98.0°C  (low  = +127.0°C, high = +70.0°C)  sensor = thermal diode
temp3:+98.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor


--
Joseph



Re: [gentoo-user] Computer turn itself off

2015-05-23 Thread Joseph

On 05/23/15 18:08, Zhu Sha Zang wrote:

On 05/23/2015 05:24 PM, Joseph wrote:

I have a box in a remote location (8-core CPU) and it turn itself off
during compiling

The box it connected to UPS.  Is it power supply?



Maybe. I have a problem like that when using high processing simulation
with nvidia-cuda and the power supply protection was unable to keep a
safe energy level then the system goes off.

But, if the failure happens during compilation time can be a heat
problem. Install lm_sensors and use something like that: watch -n 1
sensors.

If not, if the temperature stay at safe levels, maybe you have a RAM
corruption. In this case, you'll need to use memtest86++ to check.

Good Luck


I tried to read the lm-sensors again and the compupter turn crash with the 
readings:

fan1:   0 RPM  (min =   10 RPM)  ALARM
fan2:   0 RPM  (min =0 RPM)
fan3:   0 RPM  (min =0 RPM)
fan5:   0 RPM  (min =0 RPM)
temp1:+47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:   +106.0°C  (low  = +127.0°C, high = +70.0°C)  sensor = thermal diode
temp3:   +106.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
cpu0_vid:+1.250 V

I'm suspecting it is power supply. 



--
Joseph



[gentoo-user] Computer turn itself off

2015-05-23 Thread Joseph

I have a box in a remote location (8-core CPU) and it turn itself off during 
compiling

The box it connected to UPS.  Is it power supply?

--
Joseph



Re: [gentoo-user] Computer turn itself off

2015-05-23 Thread Mick
On Saturday 23 May 2015 23:53:32 Joseph wrote:
 On 05/23/15 18:08, Zhu Sha Zang wrote:
 On 05/23/2015 05:24 PM, Joseph wrote:
  I have a box in a remote location (8-core CPU) and it turn itself off
  during compiling
  
  The box it connected to UPS.  Is it power supply?
 
 Maybe. I have a problem like that when using high processing simulation
 with nvidia-cuda and the power supply protection was unable to keep a
 safe energy level then the system goes off.
 
 But, if the failure happens during compilation time can be a heat
 problem. Install lm_sensors and use something like that: watch -n 1
 sensors.
 
 If not, if the temperature stay at safe levels, maybe you have a RAM
 corruption. In this case, you'll need to use memtest86++ to check.
 
 Good Luck
 
 I tried to read the lm-sensors again and the compupter turn crash with the
 readings:
 
 fan1:   0 RPM  (min =   10 RPM)  ALARM
 fan2:   0 RPM  (min =0 RPM)
 fan3:   0 RPM  (min =0 RPM)
 fan5:   0 RPM  (min =0 RPM)
 temp1:+47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
 thermistor temp2:   +106.0°C  (low  = +127.0°C, high = +70.0°C) 
 sensor = thermal diode temp3:   +106.0°C  (low  = +127.0°C, high =
 +127.0°C)  sensor = thermistor cpu0_vid:+1.250 V
 
 I'm suspecting it is power supply.

I wouldn't trust these numbers.  You probably need a different/correct driver 
in your kernel to measure your CPU and MoBo chipset readings and/or a later 
BIOS firmware.

Whenever I had such problems they were down to bad memory (some PCs are rather 
particular in only accepting matching memory modules) and also down to a sick 
power supply.

Memetest86+ should tell you after some hours if something is amiss.

The power supply problem will require opening it up and checking for domed 
capacitors.  A few cents later and with a soldering iron in hand you should be 
able to fix any cheap capacitor induced failure.

Of course if the machine is hundreds of miles away, attending to it is more of 
a problem.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Computer turn itself off

2015-05-23 Thread Zhu Sha Zang

On 05/23/2015 06:53 PM, Joseph wrote:

On 05/23/15 18:08, Zhu Sha Zang wrote:

On 05/23/2015 05:24 PM, Joseph wrote:

I have a box in a remote location (8-core CPU) and it turn itself off
during compiling

The box it connected to UPS.  Is it power supply?



Maybe. I have a problem like that when using high processing simulation
with nvidia-cuda and the power supply protection was unable to keep a
safe energy level then the system goes off.

But, if the failure happens during compilation time can be a heat
problem. Install lm_sensors and use something like that: watch -n 1
sensors.

If not, if the temperature stay at safe levels, maybe you have a RAM
corruption. In this case, you'll need to use memtest86++ to check.

Good Luck


I tried to read the lm-sensors again and the compupter turn crash with 
the readings:


fan1:   0 RPM  (min =   10 RPM)  ALARM
fan2:   0 RPM  (min =0 RPM)
fan3:   0 RPM  (min =0 RPM)
fan5:   0 RPM  (min =0 RPM)
temp1:+47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = 
thermistor
temp2:   +106.0°C  (low  = +127.0°C, high = +70.0°C)  sensor = 
thermal diode
temp3:   +106.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = 
thermistor

cpu0_vid:+1.250 V

I'm suspecting it is power supply.



Hey, did you run sensors-detect and /etc/init.d/lm_sensors as root 
before use sensors?


As was said, maybe you're using wrong kernel modules.

Regards