RE: ZFS and DMA read error

2009-09-07 Thread Daniel Eriksson
Mark Stapper wrote:

 Yeah, i did the long SMART selftest three times now, each of which it
 failed on the same LBA address.

I assume 'smartctl -a /dev/adX' reports that the read test failed at LBA
XXX something?

 Why would I want to clear my driver before I run these tests?

In this case it's not really clearing the drive you are aiming for, it
is to write to every sector. If you have a failed sector (which you do),
writing to it will force the drive firmware to remap the sector. As far
as I know, most drives will not remap an unreadable sector until it is
written to.

/Daniel Eriksson
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ZFS and DMA read error

2009-09-07 Thread Mark Stapper
Daniel Eriksson wrote:
 Mark Stapper wrote:

   
 Yeah, i did the long SMART selftest three times now, each of which it
 failed on the same LBA address.
 

 I assume 'smartctl -a /dev/adX' reports that the read test failed at LBA
 XXX something?
   
Indeed it does. Always with the same LBA code/sector/address or whichever.
   
 Why would I want to clear my driver before I run these tests?
 

 In this case it's not really clearing the drive you are aiming for, it
 is to write to every sector. If you have a failed sector (which you do),
 writing to it will force the drive firmware to remap the sector. As far
 as I know, most drives will not remap an unreadable sector until it is
 written to.
   
So I see. Could this be why I haven't had any read errors anymore?
(After the zpool scrub that is)
 /Daniel Eriksson
   




signature.asc
Description: OpenPGP digital signature


RE: ZFS and DMA read error

2009-09-07 Thread Daniel Eriksson
Mark Stapper wrote:

 So I see. Could this be why I haven't had any read errors anymore?
 (After the zpool scrub that is)

Possibly, but in that case the SMART selftest should pass also. Have you
tried a selftest after you did the scrub?

/Daniel Eriksson
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ZFS and DMA read error

2009-09-07 Thread Mark Stapper
Daniel Eriksson wrote:
 Mark Stapper wrote:

   
 So I see. Could this be why I haven't had any read errors anymore?
 (After the zpool scrub that is)
 

 Possibly, but in that case the SMART selftest should pass also. Have you
 tried a selftest after you did the scrub?

 /Daniel Eriksson
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
   
multiple times



signature.asc
Description: OpenPGP digital signature


RE: ZFS and DMA read error

2009-09-03 Thread Daniel Eriksson
Mark Stapper wrote:

 People are REALLY pushing spinrite lately... I did get it though, just
to try it.

SpinRite is OK but it hasn't been updated in ages. It does not work on
large drives. 250GB works, 1TB does not. Haven't tried it on 500GB
drives.

If I were you I would 'zpool offline ...' the offending drive, rewrite
the entire drive with 'dd if=/dev/zero ...' and then run a SMART
selftest on it using smartmontools ('smartctl -t long /dev/adX'). When
you 'zpool online ...' the drive ZFS will resilver it for you. After
doing all of this I would then run a 'zpool scrub ...'. If the scrub
finishes without checksum errors and without any ATA-related errors the
drive is probably in good enough condition to keep using, but watch out
for more ATA errors. If the drive is dying it won't be long before it
starts to generate more ATA errors.

/Daniel Eriksson
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ZFS and DMA read error

2009-09-03 Thread Mark Stapper

 SpinRite is OK but it hasn't been updated in ages. It does not work on
 large drives. 250GB works, 1TB does not. Haven't tried it on 500GB
 drives.
   
So it will be useles in... well in this case it IS useles...
 If I were you I would 'zpool offline ...' the offending drive, rewrite
 the entire drive with 'dd if=/dev/zero ...' and then run a SMART
 selftest on it using smartmontools ('smartctl -t long /dev/adX'). When
 you 'zpool online ...' the drive ZFS will resilver it for you. After
 doing all of this I would then run a 'zpool scrub ...'. If the scrub
 finishes without checksum errors and without any ATA-related errors the
 drive is probably in good enough condition to keep using, but watch out
 for more ATA errors. If the drive is dying it won't be long before it
 starts to generate more ATA errors.
   
Yeah, i did the long SMART selftest three times now, each of which it
failed on the same LBA address.
Did the scrub as well, took two hours, and no DMA errors were reported.
Why would I want to clear my driver before I run these tests?
I ordered a spare drive so I'll wait until it arrives, replace the
faulty drive with this one by dd-ing data from one to the other (I have
only 4 SATA ports so I can't do zpool replace).
Or meybe I'll just swap them out and do zpool scrub.
I'm uncomfortable doing this though, because if any of the other drives
fails/crashes/flips me off I'll have to restore from my backup which
took two days to make... (which is the drawback of a gzipped zfs backup
partition)
Once I've replaced the drive I'll run hitachi's Drive Fitness Test on
the (presumably) failing drive. Even if it doesn't generate any ATA
errors during everyday use, the error it gave before, combined with the
failing SMART self test disturbs me. I bought the drive 2 months ago, so
my faith has gone.
However, if it passes Hitachi's DFT my faith will be restored :-).
Greetz,
Mark



signature.asc
Description: OpenPGP digital signature


Re: ZFS and DMA read error

2009-09-03 Thread Arthur Chance

Mark Stapper wrote:
[snip]

I ordered a spare drive so I'll wait until it arrives, replace the
faulty drive with this one by dd-ing data from one to the other (I have
only 4 SATA ports so I can't do zpool replace).


zpool replace has two forms

zpool replace pool old-device new-device

and

zpool replace pool device

The latter is for when you pull the old drive and put the new one on the 
same {S,P}ATA port because you've no free ports. I did that a couple of 
weeks ago when one of my raidz drives fried (in its warranty period!) 
and it worked like a dream. I did a zpool replace and then a zpool scrub 
 to make sure everything was OK because of this section of the zpool 
man page:


Scrubbing  and resilvering are very similar operations. The differ-
ence is that resilvering only examines data that ZFS  knows  to  be
out  of  date (for example, when attaching a new device to a mirror
or replacing an existing device), whereas  scrubbing  examines  all
data to discover silent errors due to hardware faults or disk fail-
ure.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ZFS and DMA read error

2009-09-03 Thread Mark Stapper

Arthur Chance wrote:

Mark Stapper wrote:
[snip]

I ordered a spare drive so I'll wait until it arrives, replace the
faulty drive with this one by dd-ing data from one to the other (I have
only 4 SATA ports so I can't do zpool replace).


zpool replace has two forms

zpool replace pool old-device new-device

and

zpool replace pool device

The latter is for when you pull the old drive and put the new one on 
the same {S,P}ATA port because you've no free ports. I did that a 
couple of weeks ago when one of my raidz drives fried (in its warranty 
period!) and it worked like a dream. I did a zpool replace and then a 
zpool scrub  to make sure everything was OK because of this section of 
the zpool man page:


Scrubbing  and resilvering are very similar operations. The differ-
ence is that resilvering only examines data that ZFS  knows  to  be
out  of  date (for example, when attaching a new device to a mirror
or replacing an existing device), whereas  scrubbing  examines  all
data to discover silent errors due to hardware faults or disk fail-
ure.


Thanks for the tip. I'll be sure to try that.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ZFS and DMA read error

2009-09-01 Thread Mark Stapper

 snip 9 identical messages, based on the uncorrectable LBA error



 Since it's all throwing errors at the same LBA, I'd run a SMART
 diagnostics on the drive (i think it's port sysutils/smartmontools)
 and see if it's showing errors too.  Looks like a failing/failed drive
 and I would recommend replacing it.  I doubt (but you can try)
 spinrite will help you when you get to this point.
   
Thought about that, will do that, after running zfs scrub.
Weird thing is that zfs hasn't show any data/checksum errors. Does this
mean successive reads were successful?
 spinrite's website is at grc.com
   
People are REALLY pushing spinrite lately... I did get it though, just
to try it.

 Hope you have backups or redundancy.  No fun replacing data.
   
I have both :-).



signature.asc
Description: OpenPGP digital signature


Re: ZFS and DMA read error

2009-08-31 Thread Tim Judd
On 8/31/09, Mark Stapper st...@mapper.nl wrote:
 Good day to you,

 I'm having a bit of trouble with one of the disks in my zfs raidz1 pool.
 It's giving me dma read error, and zpool is reporting READ failures.
 However, data integrity is OK :-)
 Unfortunately I was in the middle of rearranging my backup media, so I'm
 backup up everything as we speak.
 I will be testing the failing drive in another computer soon, however
 before I return it i'd like to know if this could be caused my something
 other than hardware failing.
 Below the output of zpool status and a snippet of /var/log/messages
 showing the DMA errors.
 Thanks for the input.
 Greetz,
 Mark


 pool: data
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 dataONLINE   0 0 0
   raidz1ONLINE   0 0 0
 ad4 ONLINE   0 0 0
 ad6 ONLINE  21 0 0
 ad8 ONLINE   0 0 0
 ad10ONLINE   0 0 0

 errors: No known data errors

 Aug 31 03:04:35 yoshi kernel: ad6: FAILURE - READ_DMA48
 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=932040832
 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data
 path=/dev/ad6 offset=477204905984 size=65536 error=5
 Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data
 path=/dev/ad6 offset=477204925440 size=2560 error=5
snip 9 identical messages, based on the uncorrectable LBA error



Since it's all throwing errors at the same LBA, I'd run a SMART
diagnostics on the drive (i think it's port sysutils/smartmontools)
and see if it's showing errors too.  Looks like a failing/failed drive
and I would recommend replacing it.  I doubt (but you can try)
spinrite will help you when you get to this point.

spinrite's website is at grc.com


Hope you have backups or redundancy.  No fun replacing data.


--TJ
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org