Re: [gentoo-user] Computer turn itself off
On 05/23/2015 05:24 PM, Joseph wrote: I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? Maybe. I have a problem like that when using high processing simulation with nvidia-cuda and the power supply protection was unable to keep a safe energy level then the system goes off. But, if the failure happens during compilation time can be a heat problem. Install lm_sensors and use something like that: watch -n 1 sensors. If not, if the temperature stay at safe levels, maybe you have a RAM corruption. In this case, you'll need to use memtest86++ to check. Good Luck
Re: [gentoo-user] Computer turn itself off
On 05/23/15 18:08, Zhu Sha Zang wrote: On 05/23/2015 05:24 PM, Joseph wrote: I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? Maybe. I have a problem like that when using high processing simulation with nvidia-cuda and the power supply protection was unable to keep a safe energy level then the system goes off. But, if the failure happens during compilation time can be a heat problem. Install lm_sensors and use something like that: watch -n 1 sensors. If not, if the temperature stay at safe levels, maybe you have a RAM corruption. In this case, you'll need to use memtest86++ to check. Good Luck Thank you for the feedback, checking the sensors there is what I get: fan1: 0 RPM (min = 10 RPM) ALARM fan2: 0 RPM (min =0 RPM) fan3: 0 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) temp1:+45.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2:+98.0°C (low = +127.0°C, high = +70.0°C) sensor = thermal diode temp3:+98.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor -- Joseph
Re: [gentoo-user] Computer turn itself off
On 05/23/15 18:08, Zhu Sha Zang wrote: On 05/23/2015 05:24 PM, Joseph wrote: I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? Maybe. I have a problem like that when using high processing simulation with nvidia-cuda and the power supply protection was unable to keep a safe energy level then the system goes off. But, if the failure happens during compilation time can be a heat problem. Install lm_sensors and use something like that: watch -n 1 sensors. If not, if the temperature stay at safe levels, maybe you have a RAM corruption. In this case, you'll need to use memtest86++ to check. Good Luck I tried to read the lm-sensors again and the compupter turn crash with the readings: fan1: 0 RPM (min = 10 RPM) ALARM fan2: 0 RPM (min =0 RPM) fan3: 0 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) temp1:+47.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +106.0°C (low = +127.0°C, high = +70.0°C) sensor = thermal diode temp3: +106.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor cpu0_vid:+1.250 V I'm suspecting it is power supply. -- Joseph
[gentoo-user] Computer turn itself off
I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? -- Joseph
Re: [gentoo-user] Computer turn itself off
On Saturday 23 May 2015 23:53:32 Joseph wrote: On 05/23/15 18:08, Zhu Sha Zang wrote: On 05/23/2015 05:24 PM, Joseph wrote: I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? Maybe. I have a problem like that when using high processing simulation with nvidia-cuda and the power supply protection was unable to keep a safe energy level then the system goes off. But, if the failure happens during compilation time can be a heat problem. Install lm_sensors and use something like that: watch -n 1 sensors. If not, if the temperature stay at safe levels, maybe you have a RAM corruption. In this case, you'll need to use memtest86++ to check. Good Luck I tried to read the lm-sensors again and the compupter turn crash with the readings: fan1: 0 RPM (min = 10 RPM) ALARM fan2: 0 RPM (min =0 RPM) fan3: 0 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) temp1:+47.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +106.0°C (low = +127.0°C, high = +70.0°C) sensor = thermal diode temp3: +106.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor cpu0_vid:+1.250 V I'm suspecting it is power supply. I wouldn't trust these numbers. You probably need a different/correct driver in your kernel to measure your CPU and MoBo chipset readings and/or a later BIOS firmware. Whenever I had such problems they were down to bad memory (some PCs are rather particular in only accepting matching memory modules) and also down to a sick power supply. Memetest86+ should tell you after some hours if something is amiss. The power supply problem will require opening it up and checking for domed capacitors. A few cents later and with a soldering iron in hand you should be able to fix any cheap capacitor induced failure. Of course if the machine is hundreds of miles away, attending to it is more of a problem. -- Regards, Mick signature.asc Description: This is a digitally signed message part.
Re: [gentoo-user] Computer turn itself off
On 05/23/2015 06:53 PM, Joseph wrote: On 05/23/15 18:08, Zhu Sha Zang wrote: On 05/23/2015 05:24 PM, Joseph wrote: I have a box in a remote location (8-core CPU) and it turn itself off during compiling The box it connected to UPS. Is it power supply? Maybe. I have a problem like that when using high processing simulation with nvidia-cuda and the power supply protection was unable to keep a safe energy level then the system goes off. But, if the failure happens during compilation time can be a heat problem. Install lm_sensors and use something like that: watch -n 1 sensors. If not, if the temperature stay at safe levels, maybe you have a RAM corruption. In this case, you'll need to use memtest86++ to check. Good Luck I tried to read the lm-sensors again and the compupter turn crash with the readings: fan1: 0 RPM (min = 10 RPM) ALARM fan2: 0 RPM (min =0 RPM) fan3: 0 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) temp1:+47.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +106.0°C (low = +127.0°C, high = +70.0°C) sensor = thermal diode temp3: +106.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor cpu0_vid:+1.250 V I'm suspecting it is power supply. Hey, did you run sensors-detect and /etc/init.d/lm_sensors as root before use sensors? As was said, maybe you're using wrong kernel modules. Regards