Re: ATA troubles

2011-07-25 Thread Andrea Venturoli

On 07/25/11 02:45, Jerome Herman wrote:


At the beginning of June, I installed two WD 1TB Caviar Green SATA



Just a shot in the dark : are your drives of the green kind ? Such as
Western Digital Caviar Green ?


Exactly.
I disabled the idle timer though.




Also since they are ATA drives make sure you are using 80pins ribbons
and that DMA is properly activated in BIOS.


They are SATA drives.




You can also try to reduce DMA level, it must be on UDMA5 by default,
try using UDMA 4 (aka UDMA/66) or UDMA 3.


Does this apply to SATA?
How would I do that?



 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ATA troubles

2011-07-25 Thread perryh
Jerome Herman jher...@dichotomia.fr wrote:

  Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48 
  status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=1671887488
  Jul 24 23:48:36 mydavid kernel: 
  g_vfs_done():stripe/backup[READ(offset=1712012836864, 
  length=131072)]error = 5
 ... since they are ATA drives make sure you are using 80pins
 ribbons and that DMA is properly activated in BIOS.

 You can also try to reduce DMA level, it must be on UDMA5 by
 default, try using UDMA 4 (aka UDMA/66) or UDMA 3.

I fixed a similar problem -- involving a VIA 6421 controller --
a while back, by using atacontrol(8) to reduce the DMA speed
from UDMA133 to UDMA100.  Evidently it is possible, under some
circumstances, for a device and controller to negotiate a speed
that's too high to actually work :(
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ATA troubles

2011-07-25 Thread Jerome Herman

On 25/07/2011 08:33, Andrea Venturoli wrote:

On 07/25/11 02:45, Jerome Herman wrote:


At the beginning of June, I installed two WD 1TB Caviar Green SATA



Just a shot in the dark : are your drives of the green kind ? Such as
Western Digital Caviar Green ?


Exactly.
I disabled the idle timer though.




Also since they are ATA drives make sure you are using 80pins ribbons
and that DMA is properly activated in BIOS.


They are SATA drives.


Ok I must have been way more tired than I thought when I answered...

A few things though,
WD Green have always been very problematic, in FreeBSD and elsewhere. 
FreeBSD is just very, very touchy when it comes to ATA errors.
The problem you are encountering is not new, cf 
http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting . 
Some people even think there is a cryptic bug somewhere in the ATA driver.
I had my share of strange errors, but with gvinum rather than 
geom_stripe. I now avoid WD caviar green completely. As for SMART test, 
I would not believe them, SATA drives tends to silently remap bad 
blocks, leaving SMART counters untouched.


A long time ago Scott Long offered to help track this problem, you might 
want to contact him and see whether he found something.







You can also try to reduce DMA level, it must be on UDMA5 by default,
try using UDMA 4 (aka UDMA/66) or UDMA 3.


Does this apply to SATA?
How would I do that?



 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to 
freebsd-questions-unsubscr...@freebsd.org


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ATA troubles

2011-07-25 Thread Andrea Venturoli

On 07/25/11 12:01, Jerome Herman wrote:


Ok I must have been way more tired than I thought when I answered...


:-)




A few things though,
WD Green have always been very problematic, in FreeBSD and elsewhere.


I acknowledge that.
However, I think I can live with some glitch, but what I'm experiencing 
seems to me far too much: such a drive might slow down, but this should 
not result in a kernel panic.







The problem you are encountering is not new, cf


I know: however I only found a lot of reports, with no solutions.



 http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

I've only looked briefly into this, but I'll read it carefully.
Thanks.




Some people even think there is a cryptic bug somewhere in the ATA driver.


That's what I'm starting to think.
In case anyone is interested, now that I have a test box up and running, 
I'm willing to try anything fancy.





I now avoid WD caviar green completely.


That's what I'll do in the future.
However, I've seen posts reporting this kind of troubles with other 
brands too.





As for SMART test,
I would not believe them, SATA drives tends to silently remap bad
blocks, leaving SMART counters untouched.


I know; I'll run WD's diag tools from time to time.





A long time ago Scott Long offered to help track this problem, you might
want to contact him and see whether he found something.


I'm CCing him.



 bye  Thanks
av.

P.S. I tried updating from 7.3 to 8.2 and see if anything changed, but I 
still get the same problems.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ATA troubles

2011-07-25 Thread Andrea Venturoli

On 07/25/11 16:03, per...@pluto.rain.com wrote:


I fixed a similar problem -- involving a VIA 6421 controller --
a while back, by using atacontrol(8) to reduce the DMA speed
from UDMA133 to UDMA100.  Evidently it is possible, under some
circumstances, for a device and controller to negotiate a speed
that's too high to actually work :(


I checked and it was already set at UDMA100; I tried UDMA66, but nothing 
changed.

I don't know if this is really effective with SATA...

 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: ATA troubles

2011-07-25 Thread perryh
Andrea Venturoli m...@netfence.it wrote:
 On 07/25/11 16:03, per...@pluto.rain.com wrote:
  I fixed a similar problem -- involving a VIA 6421 controller --
  a while back, by using atacontrol(8) to reduce the DMA speed
  from UDMA133 to UDMA100.
 ...
 I don't know if this is really effective with SATA...

Nor do I; my problem involved a PATA device.

Dmesg reports for SATA devices include a UDMAxx notation in addition
to the SATA speed notation, but I don't know its significance.

ad0: 305245MB Hitachi HDT725032VLAT80 V54OA4NA at ata0-master
 UDMA66 
ad1: 32253MB MAXTOR 6L040L2 A93.0500 at ata0-slave UDMA66 
ad4: 61136MB PATRIOT MEMORY 64GB SSD 02.10104 at ata2-master
 UDMA100 SATA 1.5Gb/s
acd1: DVDR PIONEER DVD-RW DVR-212D/1.24 at ata3-master
 UDMA66 SATA 1.5Gb/s
ad8: 305245MB Hitachi HDT725032VLAT80 V54OA4NA at ata4-master
 UDMA133 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


ATA troubles

2011-07-24 Thread Andrea Venturoli

Hello everyone.

For those interested, this post is a sequel of:
http://www.mailinglistarchive.com/html/freebsd-questions%40freebsd.org/2011-06/msg00018.html
However, I'll summarize.



At the beginning of June, I installed two WD 1TB Caviar Green SATA 
drives into an Intel-S5000-based production box of mine and it was hell!
This server runs 7.3/i386 off a SAS RAID and the two new drives should 
have worked with gstripe to constitute a secondary storage.

I started getting:

ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SMART taskqueue timeout - completing request directly
ad8: WARNING - SMART taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request
directly

and the box would reboot within minutes.
This also prevented me from running tests with smartctl.
Notice the box had previously a single SATA drive working perfectly.

It was suggested I ran wdidle.exe from DOS to prevent the drives from 
spinning down and it helped: now I was at least able to fsck the stripe 
and copy something on it.
Still I keep getting the above messages; the drives would also 
occasionally hang and then restart. Uptime raised to some hours, but the 
box would still reboot.


In the meantime the drives went bad (smartd, BIOS and WD-tools proven) 
and I had them replaced.


When they came back, I decided to put up a test box: hardware is 
completely different from the production box, but still FreeBSD will run 
from a SCSI drive and the two WD will constitute an additional stripe.
First I run WD tools to check the driver and they passed every test 
(including long one).


So I installed FreeBSD 7.3/i386, smartctl and verified the disks again.

I created the stripe, fscked it, and copied about 420GB of data via 
rsync over NFS. It seemed to work fine, but, after about 15 hours, the 
box rebooted after:

ad6: FAILURE - device detached
g_vfs_done():stripe/backup[WRITE(offset=1709926940672, length=131072)]error = 6
/mnt/local: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error


Subsequent retries always gave the same results, until I disabled 
softupdates on the stripe. I then was able to complete the rsync.


Not quite happy, I made a local to local copy and started getting a lot of:

Jul 24 18:54:28 mydavid kernel: ad4: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1620416000
Jul 24 18:54:28 mydavid kernel: ad4: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=10NID_NOT_FOUND LBA=1620416000
Jul 24 18:54:28 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1659305967616, length=131072)]error = 5
Jul 24 18:54:42 mydavid kernel: ad6: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1621920384
Jul 24 18:54:42 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=10NID_NOT_FOUND LBA=1621920384
Jul 24 18:54:42 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1660846522368, length=131072)]error = 5
I run smartctl's short test on both drives and they were ok; I tried the 
offline test, but they got interrupted (???).

In spite of the messages above, it looked like it was working...

However, I was logged in via ssh and had to turn of the client; so I 
stopped it, went into the console and started it again.

Now it looks like one drive is not working fine anymore...

Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671887488
Jul 24 23:48:36 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712012836864, length=131072)]error = 5
Jul 24 23:48:39 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671897856
Jul 24 23:48:39 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023420928, length=131072)]error = 5
Jul 24 23:48:41 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671897888
Jul 24 23:48:41 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023486464, length=131072)]error = 5

Also, smartd is complaining:

Jul 24 23:41:59 mydavid smartd[2630]: Device: /dev/ad6, 38 Currently unreadable 
(pending) sectors
Jul 24 23:50:56 mydavid smartd[538]: Device: /dev/ad6, 39 Currently unreadable 
(pending) sectors


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


ATA troubles

2011-07-24 Thread Andrea Venturoli
(Sorry for the previous post, I accidentally hit sent, while the 
messages was still unfinished).



Hello everyone.

For those interested, this post is a sequel of:
http://www.mailinglistarchive.com/html/freebsd-questions%40freebsd.org/2011-06/msg00018.html
However, I'll summarize.



At the beginning of June, I installed two WD 1TB Caviar Green SATA 
drives into an Intel-S5000-based production box of mine and it was hell!
This server runs 7.3/i386 off a SAS RAID and the two new drives should 
have worked with gstripe to constitute a secondary storage.

I started getting:

ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SMART taskqueue timeout - completing request directly
ad8: WARNING - SMART taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request
directly

and the box would reboot within minutes.
This also prevented me from running tests with smartctl.
Notice the box had previously a single SATA drive working perfectly.

It was suggested I ran wdidle.exe from DOS to prevent the drives from 
spinning down and it helped: now I was at least able to fsck the stripe 
and copy something on it.
Still I keep getting the above messages; the drives would also 
occasionally hang and then restart. Uptime raised to some hours, but the 
box would still reboot.


In the meantime the drives went bad (smartd, BIOS and WD-tools proven) 
and I had them replaced.


When they came back, I decided to put up a test box: hardware is 
completely different from the production box, but still FreeBSD will run 
from a SCSI drive and the two WD will constitute an additional stripe.
First I run WD tools to check the driver and they passed every test 
(including long one).


So I installed FreeBSD 7.3/i386, smartctl and verified the disks again.

I created the stripe, fscked it, and copied about 420GB of data via 
rsync over NFS. It seemed to work fine, but, after about 15 hours, the 
box rebooted after:

ad6: FAILURE - device detached
g_vfs_done():stripe/backup[WRITE(offset=1709926940672, length=131072)]error = 6
/mnt/local: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error


Subsequent retries always gave the same results, until I disabled 
softupdates on the stripe. I then was able to complete the rsync.


Not quite happy, I made a local to local copy and started getting a lot of:

Jul 24 18:54:28 mydavid kernel: ad4: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1620416000
Jul 24 18:54:28 mydavid kernel: ad4: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=10NID_NOT_FOUND LBA=1620416000
Jul 24 18:54:28 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1659305967616, length=131072)]error = 5
Jul 24 18:54:42 mydavid kernel: ad6: WARNING - READ_DMA48 UDMA ICRC error 
(retrying request) LBA=1621920384
Jul 24 18:54:42 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=10NID_NOT_FOUND LBA=1621920384
Jul 24 18:54:42 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1660846522368, length=131072)]error = 5
I run smartctl's short test on both drives and they were ok; I tried the 
offline test, but they got interrupted (???).

In spite of the messages above, it looked like it was working...

However, I was logged in via ssh and had to turn of the client; so I 
stopped it, went into the console and started it again.

Now it looks like one drive is not working fine anymore...

Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671887488
Jul 24 23:48:36 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712012836864, length=131072)]error = 5
Jul 24 23:48:39 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671897856
Jul 24 23:48:39 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023420928, length=131072)]error = 5
Jul 24 23:48:41 mydavid kernel: ad6: FAILURE - READ_DMA48 status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=1671897888
Jul 24 23:48:41 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023486464, length=131072)]error = 5

Also, smartd is complaining:

Jul 24 23:41:59 mydavid smartd[2630]: Device: /dev/ad6, 38 Currently unreadable 
(pending) sectors
Jul 24 23:50:56 mydavid smartd[538]: Device: /dev/ad6, 39 Currently unreadable 
(pending) sectors


After a reboot, I've got back to the NID_NOT_FOUND errors...




While I'm still conducting other tests, has anyone any hint on this?


 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list

Re: ATA troubles

2011-07-24 Thread Jerome Herman

On 25/07/2011 01:58, Andrea Venturoli wrote:
(Sorry for the previous post, I accidentally hit sent, while the 
messages was still unfinished).



Hello everyone.

For those interested, this post is a sequel of:
http://www.mailinglistarchive.com/html/freebsd-questions%40freebsd.org/2011-06/msg00018.html 


However, I'll summarize.



At the beginning of June, I installed two WD 1TB Caviar Green SATA 
drives into an Intel-S5000-based production box of mine and it was hell!
This server runs 7.3/i386 off a SAS RAID and the two new drives should 
have worked with gstripe to constitute a secondary storage.

I started getting:
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing

request directly
ad4: WARNING - SMART taskqueue timeout - completing request directly
ad8: WARNING - SMART taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing

request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing

request directly
ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing

request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - 
completing request

directly

and the box would reboot within minutes.
This also prevented me from running tests with smartctl.
Notice the box had previously a single SATA drive working perfectly.

It was suggested I ran wdidle.exe from DOS to prevent the drives from 
spinning down and it helped: now I was at least able to fsck the 
stripe and copy something on it.
Still I keep getting the above messages; the drives would also 
occasionally hang and then restart. Uptime raised to some hours, but 
the box would still reboot.


In the meantime the drives went bad (smartd, BIOS and WD-tools proven) 
and I had them replaced.


When they came back, I decided to put up a test box: hardware is 
completely different from the production box, but still FreeBSD will 
run from a SCSI drive and the two WD will constitute an additional 
stripe.
First I run WD tools to check the driver and they passed every test 
(including long one).


So I installed FreeBSD 7.3/i386, smartctl and verified the disks again.

I created the stripe, fscked it, and copied about 420GB of data via 
rsync over NFS. It seemed to work fine, but, after about 15 hours, the 
box rebooted after:

ad6: FAILURE - device detached
g_vfs_done():stripe/backup[WRITE(offset=1709926940672, 
length=131072)]error = 6

/mnt/local: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error


Subsequent retries always gave the same results, until I disabled 
softupdates on the stripe. I then was able to complete the rsync.


Not quite happy, I made a local to local copy and started getting a 
lot of:
Jul 24 18:54:28 mydavid kernel: ad4: WARNING - READ_DMA48 UDMA ICRC 
error (retrying request) LBA=1620416000
Jul 24 18:54:28 mydavid kernel: ad4: FAILURE - READ_DMA48 
status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=1620416000
Jul 24 18:54:28 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1659305967616, 
length=131072)]error = 5
Jul 24 18:54:42 mydavid kernel: ad6: WARNING - READ_DMA48 UDMA ICRC 
error (retrying request) LBA=1621920384
Jul 24 18:54:42 mydavid kernel: ad6: FAILURE - READ_DMA48 
status=51READY,DSC,ERROR error=10NID_NOT_FOUND LBA=1621920384
Jul 24 18:54:42 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1660846522368, 
length=131072)]error = 5
I run smartctl's short test on both drives and they were ok; I tried 
the offline test, but they got interrupted (???).

In spite of the messages above, it looked like it was working...

However, I was logged in via ssh and had to turn of the client; so I 
stopped it, went into the console and started it again.

Now it looks like one drive is not working fine anymore...
Jul 24 23:48:36 mydavid kernel: ad6: FAILURE - READ_DMA48 
status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=1671887488
Jul 24 23:48:36 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712012836864, 
length=131072)]error = 5
Jul 24 23:48:39 mydavid kernel: ad6: FAILURE - READ_DMA48 
status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=1671897856
Jul 24 23:48:39 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023420928, 
length=131072)]error = 5
Jul 24 23:48:41 mydavid kernel: ad6: FAILURE - READ_DMA48 
status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=1671897888
Jul 24 23:48:41 mydavid kernel: 
g_vfs_done():stripe/backup[READ(offset=1712023486464, 
length=131072)]error = 5

Also, smartd is complaining:
Jul 24 23:41:59 mydavid smartd[2630]: Device: /dev/ad6, 38 Currently 
unreadable (pending) sectors
Jul 24 23:50:56 mydavid smartd[538]: Device: /dev/ad6, 39 Currently 
unreadable (pending) sectors


After a reboot, I've got back to the NID_NOT_FOUND errors...




While I'm still conducting other tests, has anyone any hint on this?


Just a shot in the dark : are your drives of the