Re: [CentOS] Kernel:[Hardware Error]: use of vacuum

2017-08-13 Thread Fred Smith
On Sun, Aug 13, 2017 at 08:18:24AM -0400, ken wrote:
> On 08/12/2017 07:24 PM, Fred Smith wrote:
> >Well. overheating is possible... we don't live in the cleanest possible
> >house, AND we have cats. so, in general I open up this box twice a year
> >and vacuum out the house dirt and cat fuzzies. I'm probably overdue for
> >this task.
> 
> Cleaning is a good thing to do, but not with a vacuum... the vacuum
> could loosen components, even make them disappear.  Much better
> would be to use a blower or bellows of some kind.

thanks for the reminder.

I don't actually use a vacuum, I was just being, er, loose with my
terminology. I use a can of compressed "air" where possible, remove
fans on heatsinks and blow or wipe/brush out the clogs, remove the
inlet filters and wash 'em. I get amazing amounts of cat fur.

-- 
---
Under no circumstances will I ever purchase anything offered to me as
the result of an unsolicited e-mail message. Nor will I forward chain
letters, petitions, mass mailings, or virus warnings to large numbers
of others. This is my contribution to the survival of the online
community.
 --Roger Ebert, December, 1996
- The Boulder Pledge -
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel:[Hardware Error]: use of vacuum

2017-08-13 Thread Gordon Messmer

On 08/13/2017 05:18 AM, ken wrote:
Also, cowboys scoff, but I always wear a grounded wrist strap when 
handling electronics. 



It's a good idea, especially in low-humidity climates.  Also noteworthy: 
the air moving through a hose can cause a vacuum's hose or attachment to 
build up a static charge, which is another reason it can be a bad idea 
to use a vacuum in a computer.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel:[Hardware Error]: use of vacuum

2017-08-13 Thread ken

On 08/12/2017 07:24 PM, Fred Smith wrote:

Well. overheating is possible... we don't live in the cleanest possible
house, AND we have cats. so, in general I open up this box twice a year
and vacuum out the house dirt and cat fuzzies. I'm probably overdue for
this task.


Cleaning is a good thing to do, but not with a vacuum... the vacuum 
could loosen components, even make them disappear.  Much better would be 
to use a blower or bellows of some kind.


Also, cowboys scoff, but I always wear a grounded wrist strap when 
handling electronics.



___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel:[Hardware Error]:

2017-08-12 Thread Fred Smith
On Sat, Aug 12, 2017 at 05:51:33PM -0400, Steven Tardy wrote:
> 
> > On Aug 12, 2017, at 3:50 PM, Fred Smith  
> > wrote:
> > 
> > I had a series of kernel hardware error reports today while I was away 
> > from my computer:
> > 
> > Message from syslogd@fcshome at Aug 12 10:12:24 ...
> > kernel:[Hardware Error]: MC2 Error: VB Data ECC or parity error.
> > 
> > Message from syslogd@fcshome at Aug 12 10:12:24 ...
> > kernel:[Hardware Error]: Error Status: Corrected error, no action required.
> > 
> > Message from syslogd@fcshome at Aug 12 10:12:24 ...
> > kernel:[Hardware Error]: CPU:2 (15:2:0) 
> > MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 0x9844410c0176
> > 
> > Message from syslogd@fcshome at Aug 12 10:12:24 ...
> > kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV
> > 
> > never saw anything like that before.
> > 
> > cpu is:
> > 
> >$ cat /proc/cpuinfo
> >processor: 0
> >vendor_id: AuthenticAMD
> >cpu family: 21
> >model: 2
> >model name: AMD FX(tm)-6300 Six-Core Processor
> >stepping: 0
> >microcode: 0x600084f
> >cpu MHz: 1400.000
> >cache size: 2048 KB
> >physical id: 0
> >siblings: 6
> >core id: 0
> >cpu cores: 3
> >apicid: 16
> >initial apicid: 0
> >fpu: yes
> >fpu_exception: yes
> >cpuid level: 13
> >wp: yes
> >flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> > pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid 
> > aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes 
> > xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
> > misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr 
> > tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock 
> > nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
> > pfthreshold bmi1
> >bogomips: 7023.90
> >TLB size: 1536 4K pages
> >clflush size: 64
> >cache_alignment: 64
> >address sizes: 48 bits physical, 48 bits virtual
> >power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
> > 
> > 
> > six core AMD, above is one of the cores.
> > 
> > Any clues to figure out the errors, and/or mitigate?
> > 
> > thanks!
> > 
> > Fred
> 
> MC == Machine check exception.
> The important part of a MC is the "status" code.
> One can use the Intel doc "Architecture Software Developers Manual" to decode 
> this (4000 page .pdf).
> Unsure but it looks like AMD does similar MC codes.
> Luckily Linux does some heavy lifting and decodes to "cache hierarchy error 
> L2 data eviction".
> The next most important part is the "corrected" bit.
> 
> Now what does that really mean?
> *shrug*, could be 
> firmware/drivers/overheating/poor-CPU-seating/DIMM-seating/faulty-motherboard/faulty-CPU/faulty-DIMM.

Well. overheating is possible... we don't live in the cleanest possible
house, AND we have cats. so, in general I open up this box twice a year
and vacuum out the house dirt and cat fuzzies. I'm probably overdue for
this task.

This is the first one of these I've had. Hope it's the last. but a
little PM is in order either way.

thanks for the reply.

Fred
> 
> Hope that doesn't confuse too much. (:
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos

-- 
 Fred Smith -- fre...@fcshome.stoneham.ma.us -
The Lord detests the way of the wicked 
  but he loves those who pursue righteousness.
- Proverbs 15:9 (niv) -
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel:[Hardware Error]:

2017-08-12 Thread Chris Murphy
On Sat, Aug 12, 2017 at 1:50 PM, Fred Smith
 wrote:
> I had a series of kernel hardware error reports today while I was away
> from my computer:
>
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
>  kernel:[Hardware Error]: MC2 Error: VB Data ECC or parity error.
>
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
>  kernel:[Hardware Error]: Error Status: Corrected error, no action required.


Cosmic ray corrupted data in RAM, and ECC detected and corrected it?
Whatever it was, working as intended.


-- 
Chris Murphy
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel:[Hardware Error]:

2017-08-12 Thread Steven Tardy

> On Aug 12, 2017, at 3:50 PM, Fred Smith  wrote:
> 
> I had a series of kernel hardware error reports today while I was away 
> from my computer:
> 
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
> kernel:[Hardware Error]: MC2 Error: VB Data ECC or parity error.
> 
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
> kernel:[Hardware Error]: Error Status: Corrected error, no action required.
> 
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
> kernel:[Hardware Error]: CPU:2 (15:2:0) MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 
> 0x9844410c0176
> 
> Message from syslogd@fcshome at Aug 12 10:12:24 ...
> kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV
> 
> never saw anything like that before.
> 
> cpu is:
> 
>$ cat /proc/cpuinfo
>processor: 0
>vendor_id: AuthenticAMD
>cpu family: 21
>model: 2
>model name: AMD FX(tm)-6300 Six-Core Processor
>stepping: 0
>microcode: 0x600084f
>cpu MHz: 1400.000
>cache size: 2048 KB
>physical id: 0
>siblings: 6
>core id: 0
>cpu cores: 3
>apicid: 16
>initial apicid: 0
>fpu: yes
>fpu_exception: yes
>cpuid level: 13
>wp: yes
>flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid 
> aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes 
> xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
> misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm 
> topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock 
> nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
> pfthreshold bmi1
>bogomips: 7023.90
>TLB size: 1536 4K pages
>clflush size: 64
>cache_alignment: 64
>address sizes: 48 bits physical, 48 bits virtual
>power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
> 
> 
> six core AMD, above is one of the cores.
> 
> Any clues to figure out the errors, and/or mitigate?
> 
> thanks!
> 
> Fred

MC == Machine check exception.
The important part of a MC is the "status" code.
One can use the Intel doc "Architecture Software Developers Manual" to decode 
this (4000 page .pdf).
Unsure but it looks like AMD does similar MC codes.
Luckily Linux does some heavy lifting and decodes to "cache hierarchy error L2 
data eviction".
The next most important part is the "corrected" bit.

Now what does that really mean?
*shrug*, could be 
firmware/drivers/overheating/poor-CPU-seating/DIMM-seating/faulty-motherboard/faulty-CPU/faulty-DIMM.

Hope that doesn't confuse too much. (:
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Kernel:[Hardware Error]:

2017-08-12 Thread Fred Smith
I had a series of kernel hardware error reports today while I was away 
from my computer:

Message from syslogd@fcshome at Aug 12 10:12:24 ...
 kernel:[Hardware Error]: MC2 Error: VB Data ECC or parity error.

Message from syslogd@fcshome at Aug 12 10:12:24 ...
 kernel:[Hardware Error]: Error Status: Corrected error, no action required.

Message from syslogd@fcshome at Aug 12 10:12:24 ...
 kernel:[Hardware Error]: CPU:2 (15:2:0) MC2_STATUS[-|CE|MiscV|-|-|-|-|CECC]: 
0x9844410c0176

Message from syslogd@fcshome at Aug 12 10:12:24 ...
 kernel:[Hardware Error]: cache level: L2, tx: DATA, mem-tx: EV

never saw anything like that before.

cpu is:

$ cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 2
model name  : AMD FX(tm)-6300 Six-Core Processor
stepping: 0
microcode   : 0x600084f
cpu MHz : 1400.000
cache size  : 2048 KB
physical id : 0
siblings: 6
core id : 0
cpu cores   : 3
apicid  : 16
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid 
aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips: 7023.90
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro


six core AMD, above is one of the cores.

Any clues to figure out the errors, and/or mitigate?

thanks!

Fred
-- 
---
 .Fred Smith   /  
( /__  ,__.   __   __ /  __   : / 
 //  /   /__) /  /  /__) .+'   Home: fre...@fcshome.stoneham.ma.us 
//  (__ (___ (__(_ (___ / :__ 781-438-5471 
 Jude 1:24,25 -
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos