Re: hardware errors

2014-06-11 Thread Richard Hector

On 10/06/14 23:24, Ralf Mardorf wrote:

On Tue, 2014-06-10 at 23:07 +1200, Richard Hector wrote:

On 10/06/14 23:04, Ralf Mardorf wrote:

On Tue, 2014-06-10 at 21:24 +1200, Richard Hector wrote:

On 09/06/14 11:35, B wrote:

On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector
rich...@walnut.gen.nz wrote:


I assume the RAM needs replacing - is it possible to figure
out which DIMM(s)?


Install memtest86+ and boot on it, then leave at least 3
complete cycles to run.



Thanks.

Have created a memtest86+ CD and will try it tomorrow evening
(need a scheduled time to take it down).

Interestingly, there are no more errors logged for the last day
and a half ...

Any guesses as to how long these 3 complete cycles will take?
It's a Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with
4G of RAM. I haven't run memtest for ages ...


IIRC one complete standard test with my dual-core Athlon 2.1 GHz 4
GiB RAM takes more than 1 hour. I guess in 1 day it does around 8
complete tests, perhaps I run it just during the night in half of a
day. I might be mistaken, but you should expect that you need to
run it for several hours.


Thanks. I'm not sure how long we can afford to leave the machine down;
hopefully the error will show up promptly. BTW - it will show an error
even if ECC corrects it, right?


No ECC here. I don't know.

I used StartPage and searched for memtest ECC. It seems to be, that
memetst isn't good to test ECC. The current version seems to provide
very limited hardware, seemingly Intel only.


Yep. Halfway through the third pass; no errors yet. I'm not holding my
breath.
Any ideas on where to read up on those error messages, to figure out
what they actually mean?

Richard


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/53981f2d.9070...@walnut.gen.nz



Re: hardware errors

2014-06-11 Thread rob

On 11/06/14 10:19, Richard Hector wrote:
 On 10/06/14 23:24, Ralf Mardorf wrote:
 On Tue, 2014-06-10 at 23:07 +1200, Richard Hector wrote:
 On 10/06/14 23:04, Ralf Mardorf wrote:
 On Tue, 2014-06-10 at 21:24 +1200, Richard Hector wrote:
 On 09/06/14 11:35, B wrote:
 On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector
 rich...@walnut.gen.nz wrote:
 I assume the RAM needs replacing - is it possible to figure
 out which DIMM(s)?

 Install memtest86+ and boot on it, then leave at least 3
 complete cycles to run.

 Thanks.

 Have created a memtest86+ CD and will try it tomorrow evening
 (need a scheduled time to take it down).

 Interestingly, there are no more errors logged for the last day
 and a half ...

 Any guesses as to how long these 3 complete cycles will take?
 It's a Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with
 4G of RAM. I haven't run memtest for ages ...

 IIRC one complete standard test with my dual-core Athlon 2.1 GHz 4
 GiB RAM takes more than 1 hour. I guess in 1 day it does around 8
 complete tests, perhaps I run it just during the night in half of a
 day. I might be mistaken, but you should expect that you need to
 run it for several hours.

 Thanks. I'm not sure how long we can afford to leave the machine down;
 hopefully the error will show up promptly. BTW - it will show an error
 even if ECC corrects it, right?

 No ECC here. I don't know.

 I used StartPage and searched for memtest ECC. It seems to be, that
 memetst isn't good to test ECC. The current version seems to provide
 very limited hardware, seemingly Intel only.

 Yep. Halfway through the third pass; no errors yet. I'm not holding my
 breath.
 Any ideas on where to read up on those error messages, to figure out
 what they actually mean?

 Richard

Is it the cpu cache rather than ram?

https://bugzilla.kernel.org/show_bug.cgi?id=43205
https://bbs.archlinux.org/viewtopic.php?id=112113

rob


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: https://lists.debian.org/53982de7.6050...@rektau.ukfsn.org



Re: hardware errors

2014-06-11 Thread Joel Rees
On Wed, Jun 11, 2014 at 6:19 PM, Richard Hector rich...@walnut.gen.nz wrote:
 [...]
 Yep. Halfway through the third pass; no errors yet. I'm not holding my
 breath.
 Any ideas on where to read up on those error messages, to figure out
 what they actually mean?

 Richard

Don't know about other people, but when the memory subsystem starts
giving me grief, I generally vacuum around the motherboard and other
internal stuff, re-seat the cable connectors, and pop the memory and
I/O boards out, clean the contacts, and re-seat them, too.

-- 
Joel Rees

Be careful where you see conspiracy.
Look first in your own heart.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/caar43imnq7h94dlwdrzj3kbqiyonsry+zc4hc464rdtacom...@mail.gmail.com



Re: hardware errors

2014-06-11 Thread Ralf Mardorf
On Wed, 2014-06-11 at 21:01 +0900, Joel Rees wrote:
 On Wed, Jun 11, 2014 at 6:19 PM, Richard Hector rich...@walnut.gen.nz wrote:
  [...]
  Yep. Halfway through the third pass; no errors yet. I'm not holding my
  breath.
  Any ideas on where to read up on those error messages, to figure out
  what they actually mean?
 
  Richard
 
 Don't know about other people, but when the memory subsystem starts
 giving me grief, I generally vacuum around the motherboard and other
 internal stuff, re-seat the cable connectors, and pop the memory and
 I/O boards out, clean the contacts, and re-seat them, too.

I tend to use compressed air instead of vacuum, but the effect is the
same ;). Unmounting and remounting is a good advice, cleaning usually
isn't needed.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402492271.11529.36.camel@archlinux



Debian memtest package faulty? (was ... Re: hardware errors)

2014-06-10 Thread Chris Bannister
On Mon, Jun 09, 2014 at 01:45:53AM +0200, Ralf Mardorf wrote:
 On Mon, 2014-06-09 at 01:35 +0200, B wrote:
  On Mon, 09 Jun 2014 11:22:25 +1200
  Richard Hector rich...@walnut.gen.nz wrote:
  
   I assume the RAM needs replacing - is it possible to figure out
   which DIMM(s)?
  
  Install memtest86+ and boot on it, then leave at least
  3 complete cycles to run.
 
 I would use the memtest live media instead, so you're aware that you
 always get the current version from upstream. On my machine memtest from
 Debian and Ubuntu fails, while same versions from the live media don't
 fail.

So you are saying the Debian and Ubuntu versions are buggy? Which live
media version works for you?

Have you filed a bug?

Does the memtest86+ package work from the grub menu, for you?

-- 
If you're not careful, the newspapers will have you hating the people
who are being oppressed, and loving the people who are doing the 
oppressing. --- Malcolm X


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140610090654.GH3560@tal



Re: hardware errors

2014-06-10 Thread Richard Hector
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/14 11:35, B wrote:
 On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector
 rich...@walnut.gen.nz wrote:
 
 I assume the RAM needs replacing - is it possible to figure out 
 which DIMM(s)?
 
 Install memtest86+ and boot on it, then leave at least 3 complete
 cycles to run.
 

Thanks.

Have created a memtest86+ CD and will try it tomorrow evening (need a
scheduled time to take it down).

Interestingly, there are no more errors logged for the last day and a
half ...

Any guesses as to how long these 3 complete cycles will take? It's a
Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with 4G of RAM. I
haven't run memtest for ages ...

Richard
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQEcBAEBAgAGBQJTls7IAAoJELSi8I/scBaNKH4H/iNkz4dsMv3Vuo/rEmlbSeTh
4A4hqUyxKb6VqmljTm6KVEP/CCEWtu4E4MBfXg10OACGNPfaI0qbEZ2sBla3z0r6
sGYTsdem4gJZMeSV03BUc95Sw2T4HAH8Kd92OUanoZWvzh22YwKp3f0Sl/Eqakqo
M7HOL4QwvmMdvarnPfyiXn/Vc2YGP/U+lx9ueiOQb+YGdYa4VCi6FSFKn1+S6TJH
uFjaMevhcHd1WfdAISIYSdCLK3IgK/6pvDyqdnNCaPF/3w1DDhNxvygQfw9IPbkj
zowhxQfh3DiZobWjSKjsurFBBXqmShOr1VvPDMu2OxKwHTDh1cdR6OswgmKS7hU=
=RvXN
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5396cec8.7010...@walnut.gen.nz



Re: Debian memtest package faulty? (was ... Re: hardware errors)

2014-06-10 Thread Ralf Mardorf
On Tue, 2014-06-10 at 21:06 +1200, Chris Bannister wrote:
 So you are saying the Debian and Ubuntu versions are buggy? Which live
 media version works for you?
 
 Have you filed a bug?
 
 Does the memtest86+ package work from the grub menu, for you?

I can't say if it would work from the GRUB menu now, but it didn't for
older Debain and Ubuntu installs. No, I didn't file a bug report, but
reported it at least to one *buntu devel mailing list. Btw. I don't know
if memtest from Arch Linux would work, I simply never installed it to my
Debians/*buntus and other installs anymore. There's a live media from
memtest. I'm to lazy to search for the link. I got false positives. A
perfect working machine got RAM errors already when starting the test,
but when using the memtest from the memtest live media, there were no
errors when running the test several times, each day for around a day.
This was repeatable.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402398009.2813.29.camel@archlinux



Re: hardware errors

2014-06-10 Thread Ralf Mardorf
On Tue, 2014-06-10 at 21:24 +1200, Richard Hector wrote:
 On 09/06/14 11:35, B wrote:
  On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector
  rich...@walnut.gen.nz wrote:
  
  I assume the RAM needs replacing - is it possible to figure out 
  which DIMM(s)?
  
  Install memtest86+ and boot on it, then leave at least 3 complete
  cycles to run.
  
 
 Thanks.
 
 Have created a memtest86+ CD and will try it tomorrow evening (need a
 scheduled time to take it down).
 
 Interestingly, there are no more errors logged for the last day and a
 half ...
 
 Any guesses as to how long these 3 complete cycles will take? It's a
 Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with 4G of RAM. I
 haven't run memtest for ages ...

IIRC one complete standard test with my dual-core Athlon 2.1 GHz 4 GiB
RAM takes more than 1 hour. I guess in 1 day it does around 8 complete
tests, perhaps I run it just during the night in half of a day. I might
be mistaken, but you should expect that you need to run it for several
hours.



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402398283.2813.33.camel@archlinux



Re: hardware errors

2014-06-10 Thread Richard Hector
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/14 23:04, Ralf Mardorf wrote:
 On Tue, 2014-06-10 at 21:24 +1200, Richard Hector wrote:
 On 09/06/14 11:35, B wrote:
 On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector 
 rich...@walnut.gen.nz wrote:
 
 I assume the RAM needs replacing - is it possible to figure
 out which DIMM(s)?
 
 Install memtest86+ and boot on it, then leave at least 3
 complete cycles to run.
 
 
 Thanks.
 
 Have created a memtest86+ CD and will try it tomorrow evening
 (need a scheduled time to take it down).
 
 Interestingly, there are no more errors logged for the last day
 and a half ...
 
 Any guesses as to how long these 3 complete cycles will take?
 It's a Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with
 4G of RAM. I haven't run memtest for ages ...
 
 IIRC one complete standard test with my dual-core Athlon 2.1 GHz 4
 GiB RAM takes more than 1 hour. I guess in 1 day it does around 8
 complete tests, perhaps I run it just during the night in half of a
 day. I might be mistaken, but you should expect that you need to
 run it for several hours.

Thanks. I'm not sure how long we can afford to leave the machine down;
hopefully the error will show up promptly. BTW - it will show an error
even if ECC corrects it, right?

Richard


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQEcBAEBAgAGBQJTlubqAAoJELSi8I/scBaNbSgH/iXi1PZaVOhg0aTI3HyMwL4k
4ZRwEy0BwWwj3oiVwCq7c9rLISjtIohnqlblbhJ0dEEaYC1lD4EfWfwEA6R/GffI
+HGjczNDXjunvQMJUSeOdfoRu8hcysV67CKffLCLsSfAeRkbFLTJ0y6Wa9aTSfhm
mHOgmls6vyU+UrQP0rvv2rET/AevKESf727FJwICNaXZYCfZ3CmEBrarztX6hgHn
+eYJp3gPDlXhBkLIu8qS0wrnGqNdSDqS135yPWZGPYUghcgIwJRePwVurXt9NIs+
IFT7GtCjTeHeMl90sAalPJMWkxZbpBje0LSHoAZPvQd4NNx1/wWhh+1PV/3CE+w=
=9C5Q
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5396e6eb.4080...@walnut.gen.nz



Re: Debian memtest package faulty? (was ... Re: hardware errors)

2014-06-10 Thread Ralf Mardorf
On Tue, 2014-06-10 at 13:00 +0200, Ralf Mardorf wrote:
 On Tue, 2014-06-10 at 21:06 +1200, Chris Bannister wrote:
  So you are saying the Debian and Ubuntu versions are buggy? Which live
  media version works for you?
  
  Have you filed a bug?
  
  Does the memtest86+ package work from the grub menu, for you?
 
 I can't say if it would work from the GRUB menu now, but it didn't for
 older Debain and Ubuntu installs. No, I didn't file a bug report, but
 reported it at least to one *buntu devel mailing list. Btw. I don't know
 if memtest from Arch Linux would work, I simply never installed it to my
 Debians/*buntus and other installs anymore. There's a live media from
 memtest. I'm to lazy to search for the link. I got false positives. A
 perfect working machine got RAM errors already when starting the test,
 but when using the memtest from the memtest live media, there were no
 errors when running the test several times, each day for around a day.
 This was repeatable.
   
  ^^^ time for around a day



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402398650.2813.37.camel@archlinux



Re: hardware errors

2014-06-10 Thread Ralf Mardorf
On Tue, 2014-06-10 at 23:07 +1200, Richard Hector wrote:
 On 10/06/14 23:04, Ralf Mardorf wrote:
  On Tue, 2014-06-10 at 21:24 +1200, Richard Hector wrote:
  On 09/06/14 11:35, B wrote:
  On Mon, 09 Jun 2014 11:22:25 +1200 Richard Hector 
  rich...@walnut.gen.nz wrote:
  
  I assume the RAM needs replacing - is it possible to figure
  out which DIMM(s)?
  
  Install memtest86+ and boot on it, then leave at least 3
  complete cycles to run.
  
  
  Thanks.
  
  Have created a memtest86+ CD and will try it tomorrow evening
  (need a scheduled time to take it down).
  
  Interestingly, there are no more errors logged for the last day
  and a half ...
  
  Any guesses as to how long these 3 complete cycles will take?
  It's a Sun Fire X2100 M2 (dual core opteron 1218, 2600MHz) with
  4G of RAM. I haven't run memtest for ages ...
  
  IIRC one complete standard test with my dual-core Athlon 2.1 GHz 4
  GiB RAM takes more than 1 hour. I guess in 1 day it does around 8
  complete tests, perhaps I run it just during the night in half of a
  day. I might be mistaken, but you should expect that you need to
  run it for several hours.
 
 Thanks. I'm not sure how long we can afford to leave the machine down;
 hopefully the error will show up promptly. BTW - it will show an error
 even if ECC corrects it, right?

No ECC here. I don't know.

I used StartPage and searched for memtest ECC. It seems to be, that
memetst isn't good to test ECC. The current version seems to provide
very limited hardware, seemingly Intel only.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402399473.2813.44.camel@archlinux



hardware errors

2014-06-08 Thread Richard Hector
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I'm seeing this kind of thing in kern.log:

http://paste.debian.net/104039/

I've never seen these messages before IIRC, so I'm not entirely sure
if I'm interpreting them correctly.

It looks like some messages are telling me about RAM ECC errors, and
others perhaps about cache? On the CPU? Or is it all RAM errors,
detected at different places?

I assume the RAM needs replacing - is it possible to figure out which
DIMM(s)?

Thanks,
Richard
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQEcBAEBAgAGBQJTlPAjAAoJELSi8I/scBaNrQkIAIjJvLjIhTGb1pwC1X469who
ZBGUHUc6J5RVf6jrjU2ivEigTEN+D5hkp8xrPhmR16mvfp8F1yo7xx4oe9GUu4SB
XdgrMUTEmfX+lBZAVecMilUga/fs3Rdxyd7DqAfDW+b3aTUY6RvOkladJCpaADQn
tE5tby/ruM7ZsIbzDvEvypo8byj6pQh52Kx6Gv51d91/InN/fpdANYHYSKFI4d9e
XNCo5WopRK3C94KDQu942HxL7jaivTWbHk5qkWw2zpyjmnTO2dwtuQTtRMedPYgY
oYtsWFaRf+rJ6NUrOBgVStHaa243H6jbslsubVn3mtuOTqtrVP+qo9AS13J7YYw=
=WVMI
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5394f031.6000...@walnut.gen.nz



Re: hardware errors

2014-06-08 Thread Bzzzz
On Mon, 09 Jun 2014 11:22:25 +1200
Richard Hector rich...@walnut.gen.nz wrote:

 I assume the RAM needs replacing - is it possible to figure out
 which DIMM(s)?

Install memtest86+ and boot on it, then leave at least
3 complete cycles to run.

-- 
Pierre : pfff, look at the window, kids spending their life
 biking outsideā€¦
tom : they don't have a pc or what ?


signature.asc
Description: PGP signature


Re: hardware errors

2014-06-08 Thread Ralf Mardorf
On Mon, 2014-06-09 at 01:35 +0200, B wrote:
 On Mon, 09 Jun 2014 11:22:25 +1200
 Richard Hector rich...@walnut.gen.nz wrote:
 
  I assume the RAM needs replacing - is it possible to figure out
  which DIMM(s)?
 
 Install memtest86+ and boot on it, then leave at least
 3 complete cycles to run.

I would use the memtest live media instead, so you're aware that you
always get the current version from upstream. On my machine memtest from
Debian and Ubuntu fails, while same versions from the live media don't
fail.



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/1402271153.8886.4.camel@archlinux