Re: gmirror not synced

2012-01-07 Thread Gareth de Vaux
On Thu 2012-01-05 (09:56), Matthew Seaman wrote:
 drive is actually generating errors.)  Also try a few passes of
 memtest86 to try and spot problems with RAM.

Yes that was the problem, have gotten rid of a faulty DIMM and
everything is looking a lot saner, thanx :
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: gmirror not synced

2012-01-06 Thread Boris Kochergin

On 01/04/12 14:43, Gareth de Vaux wrote:

Hi all, I've noticed that the md5 hashes of a couple of files on
a gmirror change when I recalculate the hashes. The output usually
cycles between 2 hashes per file.

I'm guessing this is because each calculation reads the file
randomly from 1 of 2 component drives, and the files in question
had a few bit flips during their original sync. I also assume
this's something you have to live with for gmirror? Is removing
and completely rebuilding the secondary drive the only thing you
can do (which might fix these bit flips but incur others elsewhere)?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Hi.

Bit-flipping is unlikely, but, you can test this hypothesis by having it 
only ever read from one disk. Use gmirror configure -b to change the 
balancing algorithm of the array to priority and gmirror configure -p 
to change the priority one of of the members. Then, repeat the test for 
the other member.


What I would say is more likely is that you've got bad memory or CPU 
cache in the machine. I've had this happen to me and that turned out to 
be the case.


And, as the other reply said, it's a good idea to make sure you've got a 
good backup at this point.


-Boris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: gmirror not synced

2012-01-06 Thread Jeremy Chadwick
On Thu, Jan 05, 2012 at 09:56:02AM +, Matthew Seaman wrote:
 On 04/01/2012 19:43, Gareth de Vaux wrote:
  Hi all, I've noticed that the md5 hashes of a couple of files on
  a gmirror change when I recalculate the hashes. The output usually
  cycles between 2 hashes per file.
  
  I'm guessing this is because each calculation reads the file
  randomly from 1 of 2 component drives, and the files in question
  had a few bit flips during their original sync. I also assume
  this's something you have to live with for gmirror? Is removing
  and completely rebuilding the secondary drive the only thing you
  can do (which might fix these bit flips but incur others elsewhere)?
 
 No, that's not something acceptable at all.  Randomly flipping bits in
 files is a really nasty failure mode.
 
 What does 'gmirror list' tell you about the state of the gmirror?  Is
 there any possibility that your hardware is failing?  Check the SMART
 attributes of the disk in the first instance (it isn't brilliant for
 picking up impending failure, but it should be pretty accurate once the
 drive is actually generating errors.)  Also try a few passes of
 memtest86 to try and spot problems with RAM.  Cleaning dust out of air
 vents and heatsinks and generally making sure the machine is not
 overheating is a good idea too.

Another possibility is a disk with intermittently faulty cache, or a
drive who has basically given up (firmware bug, design flaw, etc.)
honouring ECC[1][2] when reading/writing sectors.

For the former point, SMART statistics from the drives could help
determine if this is the case, but I stress the word could.  This is
usually stored in Attribute 184 (End-to-End_Error) but is not
available on very many drives.

Gareth, please install ports/sysutils/smartmontools (make sure it's
version 5.42 or newer) and provide output from smartctl -x /dev/disk
and I'll review it for you.

[1]: http://www.storagereview.com/guide/error.html
 (read all subsections too)
[2]: http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: gmirror not synced

2012-01-05 Thread Matthew Seaman
On 04/01/2012 19:43, Gareth de Vaux wrote:
 Hi all,   I've noticed that the md5 hashes of a couple of files on
 a gmirror change when I recalculate the hashes. The output usually
 cycles between 2 hashes per file.
 
 I'm guessing this is because each calculation reads the file
 randomly from 1 of 2 component drives, and the files in question
 had a few bit flips during their original sync. I also assume
 this's something you have to live with for gmirror? Is removing
 and completely rebuilding the secondary drive the only thing you
 can do (which might fix these bit flips but incur others elsewhere)?

No, that's not something acceptable at all.  Randomly flipping bits in
files is a really nasty failure mode.

What does 'gmirror list' tell you about the state of the gmirror?  Is
there any possibility that your hardware is failing?  Check the SMART
attributes of the disk in the first instance (it isn't brilliant for
picking up impending failure, but it should be pretty accurate once the
drive is actually generating errors.)  Also try a few passes of
memtest86 to try and spot problems with RAM.  Cleaning dust out of air
vents and heatsinks and generally making sure the machine is not
overheating is a good idea too.

Actually, first thing to do is make sure you have really good backups.
Bonus points if you have been backing everything up religiously, and can
extract a known good copy of the files in question from some of the
older ones.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature


gmirror not synced

2012-01-04 Thread Gareth de Vaux
Hi all, I've noticed that the md5 hashes of a couple of files on
a gmirror change when I recalculate the hashes. The output usually
cycles between 2 hashes per file.

I'm guessing this is because each calculation reads the file
randomly from 1 of 2 component drives, and the files in question
had a few bit flips during their original sync. I also assume
this's something you have to live with for gmirror? Is removing
and completely rebuilding the secondary drive the only thing you
can do (which might fix these bit flips but incur others elsewhere)?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org