Re: problems with sata disks (taskqueue timeout)

2009-05-08 Thread UBM
On Sun, 29 Mar 2009 11:01:53 +0200
Marc UBM Bocklet u...@u-boot-man.de wrote:

 On Tue, 20 Jan 2009 08:08:29 +0100
 Marc UBM Bocklet u...@u-boot-man.de wrote:
 
  On Tue, 20 Jan 2009 09:39:51 +1100
  Andrew Snow and...@modulus.org wrote:
  
   
   I think that if you use eSATA you probably need dedicated eSATA 
   controller ports.  eSATA standard specifies a higher voltage for
   the longer cable distances.
   
   Judging from the sporadic problem reports, Promise TX4 is probably
   not the best at signal purity to begin with so using it for eSATA
   pushes it over the edge.
   
   
   Hope that helps,
  
  Thanks for the fast answer! :-)
  
  Although my version of the TX4 has two dedicated e-sata ports, the
  other posts seem to indicate that it got something to do with the
  controller (maybe signal purity, like you said). I'll try upgrading
  next and will report back after that.
 
 A very late followup here:
 
 I upgraded to the latest stable, but things did not improve:
 
 Mar 29 10:57:29 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
 error (retrying request) LBA=1087300992 Mar 29 10:57:34 hamstor
 kernel: ad10: FAILURE - SET_MULTI status=51READY,DSC,ERROR
 error=4ABORTED
 
 Mar 29 10:57:34 hamstor kernel: ad10: TIMEOUT - WRITE_DMA48 retrying
 (0 retries left) LBA=1087300992
 
 Mar 29 10:57:34 hamstor kernel: ad10: FAILURE - WRITE_DMA48
 status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
 error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
 LBA=1087300992 
 
 Mar 29 10:57:34 hamstor root: ZFS: vdev I/O failure,
 zpool=gedaerm path=/dev/ad10 offset=556698042368 size=131072 error=5
 
 Mar 29 10:57:43 hamstor kernel: ad10: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly 
 
 Mar 29 10:57:47 hamstor kernel: ad10: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly 
 
 Mar 29 10:57:51 hamstor kernel: ad10: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly 
 
 Mar 29 10:57:55 hamstor kernel: ad10: WARNING - SET_MULTI taskqueue
 timeout - completing request directly 
 
 Mar 29 10:57:55 hamstor kernel: ad10: TIMEOUT - WRITE_DMA48 retrying
 (1 retry left) LBA=1087301248 
 
 Mar 29 10:57:55 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
 error (retrying request) LBA=1087301248 
 
 Mar 29 10:58:00 hamstor kernel: ad10: FAILURE - SET_MULTI
 status=51READY,DSC,ERROR error=4ABORTED 
 
 Mar 29 10:58:00 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
 LBA=1087301248
 
 Mar 29 10:58:00 hamstor root: ZFS: vdev I/O failure, zpool=gedaerm
 path=/dev/ad10 offset=556698173440 size=131072 error=5
 
 
 
 Any further ideas anybody? :-)

Another update, upgrading to -current dating from April 25th 2009 seems
to have fixed the problem, I've encountered no errors as of yet and
I've copied about 250GB in large chunks, something that was sure to
provoke the errors with -stable.

FreeBSD xxx 8.0-CURRENT FreeBSD 8.0-CURRENT #1: Sat Apr 25 13:33:18
CEST 2009 xxx:/usr/obj/usr/src/sys/xxx  amd64

Bye
Marc

-- 
And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?

W.B. Yeats, The Second Coming
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-03-29 Thread UBM
On Tue, 20 Jan 2009 08:08:29 +0100
Marc UBM Bocklet u...@u-boot-man.de wrote:

 On Tue, 20 Jan 2009 09:39:51 +1100
 Andrew Snow and...@modulus.org wrote:
 
  
  I think that if you use eSATA you probably need dedicated eSATA 
  controller ports.  eSATA standard specifies a higher voltage for
  the longer cable distances.
  
  Judging from the sporadic problem reports, Promise TX4 is probably
  not the best at signal purity to begin with so using it for eSATA
  pushes it over the edge.
  
  
  Hope that helps,
 
 Thanks for the fast answer! :-)
 
 Although my version of the TX4 has two dedicated e-sata ports, the
 other posts seem to indicate that it got something to do with the
 controller (maybe signal purity, like you said). I'll try upgrading
 next and will report back after that.

A very late followup here:

I upgraded to the latest stable, but things did not improve:

Mar 29 10:57:29 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=1087300992 Mar 29 10:57:34 hamstor kernel:
ad10: FAILURE - SET_MULTI status=51READY,DSC,ERROR error=4ABORTED

Mar 29 10:57:34 hamstor kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0
retries left) LBA=1087300992

Mar 29 10:57:34 hamstor kernel: ad10: FAILURE - WRITE_DMA48
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
LBA=1087300992 

Mar 29 10:57:34 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=556698042368 size=131072 error=5

Mar 29 10:57:43 hamstor kernel: ad10: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly 

Mar 29 10:57:47 hamstor kernel: ad10: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly 

Mar 29 10:57:51 hamstor kernel: ad10: WARNING - SETFEATURES ENABLE
WCACHE taskqueue timeout - completing request directly 

Mar 29 10:57:55 hamstor kernel: ad10: WARNING - SET_MULTI taskqueue
timeout - completing request directly 

Mar 29 10:57:55 hamstor kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1
retry left) LBA=1087301248 

Mar 29 10:57:55 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=1087301248 

Mar 29 10:58:00 hamstor kernel: ad10: FAILURE - SET_MULTI
status=51READY,DSC,ERROR error=4ABORTED 

Mar 29 10:58:00 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
LBA=1087301248

Mar 29 10:58:00 hamstor root: ZFS: vdev I/O failure, zpool=gedaerm
path=/dev/ad10 offset=556698173440 size=131072 error=5



Any further ideas anybody? :-)

Bye
Marc


-- 
And what rough beast, its hour come round at last,
Slouches towards Bethlehem to be born?

W.B. Yeats, The Second Coming
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-01-20 Thread Bartosz Stec

Marc UBM pisze:

Hiho! :-)

Occasionally, especially when uploading a large number of files, the
(brand-new, tested) sata disks in my fileserver spit out some of these
errors:

---

Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=882778752
 
Jan 19 19:51:23 hamstor kernel:

ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
 
Jan 19 19:51:27 hamstor kernel: ad10:

WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly

Jan 19 19:51:31 hamstor kernel: ad10: WARNING -
SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
directly

Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI
taskqueue timeout - completing request directly

Jan 19 19:51:35 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left)
LBA=882778752 


Jan 19 19:51:35 hamstor kernel: ad10: FAILURE -
WRITE_DMA48
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
LBA=882778752

Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5

Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI
status=51READY,DSC,ERROR error=4ABORTED

Jan 19 19:51:41 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
LBA=882779008

Jan 19 19:51:41 hamstor kernel: ad10: WARNING -
WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19
19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE
taskqueue timeout - completing request directly

Jan 19 19:51:54 hamstor
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout
- completing request directly 


Jan 19 19:51:58 hamstor kernel: ad10:
WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
 
Jan 19 19:52:02 hamstor kernel: ad10: WARNING -

SET_MULTI taskqueue timeout - completing request directly Jan 19
19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
LBA=882779008

Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5

---

I've fiddled with the cables, which seemed to help, but I've been
unable to completely eliminate the errors. The disks are two Western
Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
4 SATA Controller:

atap...@pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc'
device = 'PDC40718-GP SATA 300 TX4 Controller'
class  = mass storage

They're connected via 50cm esata cables.

I've googled on the net and found some vague hints about problems with
the Promise TX4, but nothing concrete.

What I've found is

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

basically telling me these things happen, deal with it :-)

The problem is, I cannot produce these problems reliably, only thing I
notice is that they *seem* to happen more often if a lot of large files
are copied in succession.

Can anybody tell me if upgrading to 7.2 oder -current will help?

I'm currently running 


7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET
2008   *...@host:/usr/obj/usr/src/sys/GENERIC  amd64

Next step I'll try is upgrading to RELENG_7 to see if that helps.


Greetings,
Marc
  

Cheers Marc.

My personal experience makes me think that this issue is 
controller/driver related.
I'm using SATA 300 TX4 Controller from times of 6.1-Relaese on my 
fileserver (with 2 of 4 ports used) and I saw a lot of exactly the same 
errors in logs. Sometimes it was harmless, but sometimes as an effect of 
these one of disks magically disconnected from controller and only way 
to get it back and working was power down and up PC. That mostly 
happened while heavy I/O like while dumping filesystems.


Good thing is that starting from 7.0-release I saw such errors maybe 2-3 
times and I didn't saw them at all from at least 6 months. Probably 
because I rebuild my system about once a month to keep up with stable 
branch and something was corrected in sources through that time.


So I also advice to upgrade to RELENG_7 and you probably get rid of these.
Good luck!

--
Bartosz Stec

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


problems with sata disks (taskqueue timeout)

2009-01-19 Thread UBM

Hiho! :-)

Occasionally, especially when uploading a large number of files, the
(brand-new, tested) sata disks in my fileserver spit out some of these
errors:

---

Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=882778752
 
Jan 19 19:51:23 hamstor kernel:
ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
 
Jan 19 19:51:27 hamstor kernel: ad10:
WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly

Jan 19 19:51:31 hamstor kernel: ad10: WARNING -
SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
directly

Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI
taskqueue timeout - completing request directly

Jan 19 19:51:35 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left)
LBA=882778752 

Jan 19 19:51:35 hamstor kernel: ad10: FAILURE -
WRITE_DMA48
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
LBA=882778752

Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5

Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI
status=51READY,DSC,ERROR error=4ABORTED

Jan 19 19:51:41 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
LBA=882779008

Jan 19 19:51:41 hamstor kernel: ad10: WARNING -
WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19
19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE
taskqueue timeout - completing request directly

Jan 19 19:51:54 hamstor
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout
- completing request directly 

Jan 19 19:51:58 hamstor kernel: ad10:
WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
 
Jan 19 19:52:02 hamstor kernel: ad10: WARNING -
SET_MULTI taskqueue timeout - completing request directly Jan 19
19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
LBA=882779008

Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5

---

I've fiddled with the cables, which seemed to help, but I've been
unable to completely eliminate the errors. The disks are two Western
Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
4 SATA Controller:

atap...@pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc'
device = 'PDC40718-GP SATA 300 TX4 Controller'
class  = mass storage

They're connected via 50cm esata cables.

I've googled on the net and found some vague hints about problems with
the Promise TX4, but nothing concrete.

What I've found is

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

basically telling me these things happen, deal with it :-)

The problem is, I cannot produce these problems reliably, only thing I
notice is that they *seem* to happen more often if a lot of large files
are copied in succession.

Can anybody tell me if upgrading to 7.2 oder -current will help?

I'm currently running 

7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET
2008   *...@host:/usr/obj/usr/src/sys/GENERIC  amd64

Next step I'll try is upgrading to RELENG_7 to see if that helps.


Greetings,
Marc








___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


problems with sata disks (taskqueue timeout)

2009-01-19 Thread Marc UBM

Hiho! :-)

Occasionally, especially when uploading a large number of files, the
(brand-new, tested) sata disks in my fileserver spit out some of these
errors:

---

Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=882778752
 
Jan 19 19:51:23 hamstor kernel:
ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly
 
Jan 19 19:51:27 hamstor kernel: ad10:
WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly

Jan 19 19:51:31 hamstor kernel: ad10: WARNING -
SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
directly

Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI
taskqueue timeout - completing request directly

Jan 19 19:51:35 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left)
LBA=882778752 

Jan 19 19:51:35 hamstor kernel: ad10: FAILURE -
WRITE_DMA48
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
LBA=882778752

Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5

Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI
status=51READY,DSC,ERROR error=4ABORTED

Jan 19 19:51:41 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
LBA=882779008

Jan 19 19:51:41 hamstor kernel: ad10: WARNING -
WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19
19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE
taskqueue timeout - completing request directly

Jan 19 19:51:54 hamstor
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout
- completing request directly 

Jan 19 19:51:58 hamstor kernel: ad10:
WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly
 
Jan 19 19:52:02 hamstor kernel: ad10: WARNING -
SET_MULTI taskqueue timeout - completing request directly Jan 19
19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
LBA=882779008

Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5

---

I've fiddled with the cables, which seemed to help, but I've been
unable to completely eliminate the errors. The disks are two Western
Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
4 SATA Controller:

atap...@pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc'
device = 'PDC40718-GP SATA 300 TX4 Controller'
class  = mass storage

They're connected via 50cm esata cables.

I've googled on the net and found some vague hints about problems with
the Promise TX4, but nothing concrete.

What I've found is

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

basically telling me these things happen, deal with it :-)

The problem is, I cannot produce these problems reliably, only thing I
notice is that they *seem* to happen more often if a lot of large files
are copied in succession.

Can anybody tell me if upgrading to 7.2 oder -current will help?

I'm currently running 

7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET
2008   *...@host:/usr/obj/usr/src/sys/GENERIC  amd64

Next step I'll try is upgrading to RELENG_7 to see if that helps.


Greetings,
Marc








-- 
Marc UBM Bocklet ubm.free...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-01-19 Thread Andrew Snow


I think that if you use eSATA you probably need dedicated eSATA 
controller ports.  eSATA standard specifies a higher voltage for the 
longer cable distances.


Judging from the sporadic problem reports, Promise TX4 is probably not 
the best at signal purity to begin with so using it for eSATA pushes it 
over the edge.



Hope that helps,

- Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-01-19 Thread David Figuera

I've fiddled with the cables, which seemed to help, but I've been
unable to completely eliminate the errors. The disks are two Western
Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
4 SATA Controller:

atap...@pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc'
device = 'PDC40718-GP SATA 300 TX4 Controller'
class  = mass storage


I have a similar setup, same card, two WD disks.

On 6.3 it was affected by the problem you mention, but when I moved to 
7.0, it disapeared.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-01-19 Thread Wes Morgan

On Mon, 19 Jan 2009, Marc UBM wrote:



Hiho! :-)

Occasionally, especially when uploading a large number of files, the
(brand-new, tested) sata disks in my fileserver spit out some of these
errors:


I've found that those kind of errors are very, very controller-dependent. 
Case in point - a 4-disk raidz on an ASUS board with a VIA SATA 
controller. The drives were attached to a highpoint rocketraid controller, 
then the data was moved off and the drives attached to the VIA controller. 
As soon as the raidz was created and data was being copied back to the 
array, taskqueue errors. So, back to the highpoint controller. Swapped out 
the board for another ASUS, but this time with the Q35 / ICH9 controller. 
No a single problem whatsoever.





---

Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
error (retrying request) LBA=882778752

Jan 19 19:51:23 hamstor kernel:
ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
completing request directly

Jan 19 19:51:27 hamstor kernel: ad10:
WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
request directly

Jan 19 19:51:31 hamstor kernel: ad10: WARNING -
SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
directly

Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI
taskqueue timeout - completing request directly

Jan 19 19:51:35 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left)
LBA=882778752

Jan 19 19:51:35 hamstor kernel: ad10: FAILURE -
WRITE_DMA48
status=ffBUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR
error=ffICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH
LBA=882778752

Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5

Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI
status=51READY,DSC,ERROR error=4ABORTED

Jan 19 19:51:41 hamstor
kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
LBA=882779008

Jan 19 19:51:41 hamstor kernel: ad10: WARNING -
WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19
19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE
taskqueue timeout - completing request directly

Jan 19 19:51:54 hamstor
kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout
- completing request directly

Jan 19 19:51:58 hamstor kernel: ad10:
WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
request directly

Jan 19 19:52:02 hamstor kernel: ad10: WARNING -
SET_MULTI taskqueue timeout - completing request directly Jan 19
19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
LBA=882779008

Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure,
zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5

---

I've fiddled with the cables, which seemed to help, but I've been
unable to completely eliminate the errors. The disks are two Western
Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
4 SATA Controller:

atap...@pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
rev=0x02 hdr=0x00 vendor = 'Promise Technology Inc'
   device = 'PDC40718-GP SATA 300 TX4 Controller'
   class  = mass storage

They're connected via 50cm esata cables.

I've googled on the net and found some vague hints about problems with
the Promise TX4, but nothing concrete.

What I've found is

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

basically telling me these things happen, deal with it :-)

The problem is, I cannot produce these problems reliably, only thing I
notice is that they *seem* to happen more often if a lot of large files
are copied in succession.

Can anybody tell me if upgrading to 7.2 oder -current will help?

I'm currently running

7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET
2008   *...@host:/usr/obj/usr/src/sys/GENERIC  amd64

Next step I'll try is upgrading to RELENG_7 to see if that helps.


Greetings,
Marc










___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problems with sata disks (taskqueue timeout)

2009-01-19 Thread UBM
On Tue, 20 Jan 2009 09:39:51 +1100
Andrew Snow and...@modulus.org wrote:

 
 I think that if you use eSATA you probably need dedicated eSATA 
 controller ports.  eSATA standard specifies a higher voltage for the 
 longer cable distances.
 
 Judging from the sporadic problem reports, Promise TX4 is probably
 not the best at signal purity to begin with so using it for eSATA
 pushes it over the edge.
 
 
 Hope that helps,

Thanks for the fast answer! :-)

Although my version of the TX4 has two dedicated e-sata ports, the
other posts seem to indicate that it got something to do with the
controller (maybe signal purity, like you said). I'll try upgrading
next and will report back after that.

Bye
Marc

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SETFEATURES SET TRANSFER MODE taskqueue timeout.. Error occuring constantly.. Please help!!

2008-10-20 Thread Kristian Rooke
I have made some changes, and provided requested details.
Issue is still occuring, so if it looks like it's going to be more trouble
than it's worth I will probably just replace 3 of the PATA IDE disks with a
SATA disk and just throw the remaining PATA on the Nvidia ATA controller?

Thanks for your help thus far! :)

On Sun, Oct 19, 2008 at 8:25 AM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 On Sun, Oct 19, 2008 at 03:32:29AM +1100, Kristian Rooke wrote:
 Thanks for the quick response!

 Please see requested output below:

 Cool, thanks.  One thing I forgot to ask for was vmstat -i output.

interrupt  total   rate
irq1: atkbd0   6  0
irq6: fdc0 1  0
irq14: ata0 2060  2
irq16: atapci1   612  0
irq17: em0   810  0
cpu0: timer  1812646   1998
cpu1: timer  1812344   1998
Total3628479   4000

 For now, let's break it down for ease of understanding:

 FreeBSD 7.0-RELEASE i386, built February 2008.

 atapci0: nVidia nForce MCP73 ATA133 controller -- IRQ 14
 atapci1: Silicon Image 0680 ATA133 controller  -- IRQ 16

 ata0: attached to atapci0
 ata1: attached to atapci0
 ata2: attached to atapci1
 ata3: attached to atapci1

 ad0: Seagate ST380011A 3.06   at ata0-master PIO4
 ad4: Seagate ST3320620A 3.AAF at ata2-master PIO4
 ad5: Seagate ST3320620A 3.AAF at ata2-slave  PIO4
 ad6: Seagate ST3750640A 3.AAE at ata3-master PIO4
 ad7: Seagate ST3320620A 3.AAD at ata3-slave  PIO4

 ATA errors are reported for disks ad4, ad5, ad6, and ad7.  ad0 appears
 to be error-free.

 First and foremost: there are known problems with Silicon Image
 controllers on all operating systems (Windows, Linux, and FreeBSD in
 particular), known for causing data loss and other sporadic issues.
 This is at least confirmed on their SATA controllers, and I've become
 quite the pick something else advocate when it comes to their stuff.
 However: I've no idea about their PATA controllers.

I was originally using a Promise PATA IDE controller, but that's when the
issues first began so I bought a cheap Silicon Image IDE controller to
replace it. After reading your email I have replaced the SI card with the
Promise controller. Below is the detail from dmesg:

atapci1: Promise PDC20270 UDMA100 controller port
0xcf00-0xcf07,0xce00-0xce03,0xcd00-0xcd07,0xcc00-0xcc03,0xcb00-0xcb0f mem
0xefbf-0xefbf irq 16 at device 5.0 on pci1


 Secondly, so far there isn't any evidence that the ad0 disk, which uses
 the nVidia controller, has any problem -- all the disks having problems
 are on the Silicon Image controller.  That is a very key piece of
 information here.

 If when you're writing data to, say, the ad4 disk, and you start to see
 errors on all disks (ad4 through ad7), then what this probably means is
 the controller has locked up or is behaving badly.  This adds further
 evidence that the Silicon Image controller may be at fault here.

 Thirdly, you said the system requires a hard reset to get things back in
 working order.  Sometimes this can be induced by a power supply that
 isn't providing decent/proper voltages, or is being overloaded,
 particularly during heavy disk I/O (drawing more power in some cases).
 It might be good to check your voltages inside of your system BIOS,
 write them down, and type them in here.  FreeBSD does not provide a
 decent set of tools for monitoring this stuff inside the OS (yet; I'm
 working on it, mainly for server boards.  I do what I can...)

When error messages (same as pasted previously) begin being displayed in
console, the system becomes unresponsive.
I can no longer SSH to the device, and when I attempt to use it via console
it simply continues to constantly scroll the disk error messages.

I am currently using an Anter 550w PSU. Below are the Voltage details from
BIOS:

Vcore - 1.19V
Vcc12V - 12.30V
Vcc3.3V - 3.28V
Vcc5.0V - 5.04V

 But keep in mind that a controller locking up hard could also require a
 hard reset (pressing reset on the front of the PC) -- a soft reset
 (Ctrl-Alt-Del) would probably work, except much of the running kernel is
 spinning hard trying to deal with ATA problems.

 Fourthly, I see a some output omitted line in your original dmesg.
 Can you provide that output?  It's important -- sometimes people have
 seen issues where their ATA controller shows problems, but it turns out
 to be an IRQ sharing or device compatibility problem with another device
 (e.g. their board was showing ATA errors, but at the exact same time,
 also showing NIC watchdog timeouts or other anomalies).  They omitted
 the dmesg data thinking it had nothing to do with the problem, when in
 fact it helps determine if the issue is truly with one piece or the
 entire system.

The some output omitted was simply repeats of error messages I previously

SETFEATURES SET TRANSFER MODE taskqueue timeout.. Error occuring constantly.. Please help!!

2008-10-18 Thread Kristian Rooke
Hi,

I have a PC/box with 5 disks in it that I am using as a fileserver and
I recently upgraded some hardware and installed FreeBSD 7.0-RELEASE.
Previously I had a RAID PATA IDE controller on the motherboard (was
not using RAID functionality though), but I when I upgraded I had to
use a PCI IDE controller, due to the lack of PATA ports on the new
motherboard.
Now when I am attempting to write files, or do anything more than just
browse filesystems on the drives ad4-ad7, I get multiple occurrences
of the errors below. After these errors occur the kernel panics and I
need to perform a hard reset to get the server back up again.

Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
RCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
WCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SET_MULTI taskqueue
timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
RCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
WCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SET_MULTI taskqueue
timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad6: FAILURE - WRITE_DMA timed out
LBA=163323135
Sep 28 11:40:28 FileServer kernel:
g_vfs_done():ad6s1[WRITE(offset=83621412864, length=16384)]error = 5
some output omitted
Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
RCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
WCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SET_MULTI taskqueue
timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
TRANSFER MODE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
RCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
WCACHE taskqueue timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SET_MULTI taskqueue
timeout - completing request directly
Sep 28 11:40:28 FileServer kernel: ad5: FAILURE - WRITE_DMA timed out LBA=287

I have taken a read through some previous conversations, but I can't
seem to find the answers I'm looking for.

I've changed the IDE cables and the PATA controller and it is still
not making any difference.

I also added hw.ata.ata_dma=0 to /boot/loader.conf as recommended in a
wiki I came across, but if anything it made the issue even worse.

Can someone please help?

Thanks,
Kristian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: SETFEATURES SET TRANSFER MODE taskqueue timeout.. Error occuring constantly.. Please help!!

2008-10-18 Thread Jeremy Chadwick
On Sat, Oct 18, 2008 at 07:00:42PM +1100, Kristian Rooke wrote:
 Hi,
 
 I have a PC/box with 5 disks in it that I am using as a fileserver and
 I recently upgraded some hardware and installed FreeBSD 7.0-RELEASE.
 Previously I had a RAID PATA IDE controller on the motherboard (was
 not using RAID functionality though), but I when I upgraded I had to
 use a PCI IDE controller, due to the lack of PATA ports on the new
 motherboard.
 Now when I am attempting to write files, or do anything more than just
 browse filesystems on the drives ad4-ad7, I get multiple occurrences
 of the errors below. After these errors occur the kernel panics and I
 need to perform a hard reset to get the server back up again.
 
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: FAILURE - WRITE_DMA timed out
 LBA=163323135
 Sep 28 11:40:28 FileServer kernel:
 g_vfs_done():ad6s1[WRITE(offset=83621412864, length=16384)]error = 5
 some output omitted
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: FAILURE - WRITE_DMA timed out LBA=287
 
 I have taken a read through some previous conversations, but I can't
 seem to find the answers I'm looking for.
 
 I've changed the IDE cables and the PATA controller and it is still
 not making any difference.
 
 I also added hw.ata.ata_dma=0 to /boot/loader.conf as recommended in a
 wiki I came across, but if anything it made the issue even worse.
 
 Can someone please help?

Tracking these problems down takes a lot of time.  I hope you have the
time.  :-) Can you please provide the following output:

# dmesg
# pciconf -lv
# atacontrol list

Also, please install ports/sysutils/smartmontools (version 5.38 or
newer), and provide output for the following commands:

# smartctl -a /dev/ad4
# smartctl -a /dev/ad5
# smartctl -a /dev/ad6
# smartctl -a /dev/ad7

Thanks.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: SETFEATURES SET TRANSFER MODE taskqueue timeout.. Error occuring constantly.. Please help!!

2008-10-18 Thread Kristian Rooke
 revision number 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
100  Not_testing
200  Not_testing
300  Not_testing
400  Not_testing
500  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
===

On Sat, Oct 18, 2008 at 9:24 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 On Sat, Oct 18, 2008 at 07:00:42PM +1100, Kristian Rooke wrote:
 Hi,

 I have a PC/box with 5 disks in it that I am using as a fileserver and
 I recently upgraded some hardware and installed FreeBSD 7.0-RELEASE.
 Previously I had a RAID PATA IDE controller on the motherboard (was
 not using RAID functionality though), but I when I upgraded I had to
 use a PCI IDE controller, due to the lack of PATA ports on the new
 motherboard.
 Now when I am attempting to write files, or do anything more than just
 browse filesystems on the drives ad4-ad7, I get multiple occurrences
 of the errors below. After these errors occur the kernel panics and I
 need to perform a hard reset to get the server back up again.

 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad7: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad6: FAILURE - WRITE_DMA timed out
 LBA=163323135
 Sep 28 11:40:28 FileServer kernel:
 g_vfs_done():ad6s1[WRITE(offset=83621412864, length=16384)]error = 5
 some output omitted
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad4: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES SET
 TRANSFER MODE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
 RCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SETFEATURES ENABLE
 WCACHE taskqueue timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: WARNING - SET_MULTI taskqueue
 timeout - completing request directly
 Sep 28 11:40:28 FileServer kernel: ad5: FAILURE - WRITE_DMA timed out LBA=287

 I have taken a read through some previous conversations, but I can't
 seem to find the answers I'm looking for.

 I've changed the IDE cables and the PATA controller and it is still
 not making any difference.

 I also added hw.ata.ata_dma=0 to /boot/loader.conf as recommended in a
 wiki I came across, but if anything it made the issue even worse.

 Can someone please help?

 Tracking these problems down takes a lot of time.  I hope you have the
 time.  :-) Can you please provide the following output:

 # dmesg
 # pciconf -lv
 # atacontrol list

 Also, please install ports/sysutils/smartmontools (version 5.38 or
 newer), and provide output for the following commands:

 # smartctl -a /dev/ad4
 # smartctl -a /dev/ad5
 # smartctl -a /dev/ad6
 # smartctl -a /dev/ad7

 Thanks.

 --
 | Jeremy Chadwickjdc at parodius.com |
 | Parodius Networking

Re: SETFEATURES SET TRANSFER MODE taskqueue timeout.. Error occuring constantly.. Please help!!

2008-10-18 Thread Jeremy Chadwick
On Sun, Oct 19, 2008 at 03:32:29AM +1100, Kristian Rooke wrote:
 Thanks for the quick response!
 
 Please see requested output below:

Cool, thanks.  One thing I forgot to ask for was vmstat -i output.

For now, let's break it down for ease of understanding:

FreeBSD 7.0-RELEASE i386, built February 2008.

atapci0: nVidia nForce MCP73 ATA133 controller -- IRQ 14
atapci1: Silicon Image 0680 ATA133 controller  -- IRQ 16

ata0: attached to atapci0
ata1: attached to atapci0
ata2: attached to atapci1
ata3: attached to atapci1

ad0: Seagate ST380011A 3.06   at ata0-master PIO4
ad4: Seagate ST3320620A 3.AAF at ata2-master PIO4
ad5: Seagate ST3320620A 3.AAF at ata2-slave  PIO4
ad6: Seagate ST3750640A 3.AAE at ata3-master PIO4
ad7: Seagate ST3320620A 3.AAD at ata3-slave  PIO4

ATA errors are reported for disks ad4, ad5, ad6, and ad7.  ad0 appears
to be error-free.

First and foremost: there are known problems with Silicon Image
controllers on all operating systems (Windows, Linux, and FreeBSD in
particular), known for causing data loss and other sporadic issues.
This is at least confirmed on their SATA controllers, and I've become
quite the pick something else advocate when it comes to their stuff.
However: I've no idea about their PATA controllers.

Secondly, so far there isn't any evidence that the ad0 disk, which uses
the nVidia controller, has any problem -- all the disks having problems
are on the Silicon Image controller.  That is a very key piece of
information here.

If when you're writing data to, say, the ad4 disk, and you start to see
errors on all disks (ad4 through ad7), then what this probably means is
the controller has locked up or is behaving badly.  This adds further
evidence that the Silicon Image controller may be at fault here.

Thirdly, you said the system requires a hard reset to get things back in
working order.  Sometimes this can be induced by a power supply that
isn't providing decent/proper voltages, or is being overloaded,
particularly during heavy disk I/O (drawing more power in some cases).
It might be good to check your voltages inside of your system BIOS,
write them down, and type them in here.  FreeBSD does not provide a
decent set of tools for monitoring this stuff inside the OS (yet; I'm
working on it, mainly for server boards.  I do what I can...)

But keep in mind that a controller locking up hard could also require a
hard reset (pressing reset on the front of the PC) -- a soft reset
(Ctrl-Alt-Del) would probably work, except much of the running kernel is
spinning hard trying to deal with ATA problems.

Fourthly, I see a some output omitted line in your original dmesg.
Can you provide that output?  It's important -- sometimes people have
seen issues where their ATA controller shows problems, but it turns out
to be an IRQ sharing or device compatibility problem with another device
(e.g. their board was showing ATA errors, but at the exact same time,
also showing NIC watchdog timeouts or other anomalies).  They omitted
the dmesg data thinking it had nothing to do with the problem, when in
fact it helps determine if the issue is truly with one piece or the
entire system.

Next, let's take a look at your SMART output, which tells a tale of
something very very bad:

Disk ad4 has a good temperature, and no sign of bad blocks/sectors.  The
disk had been powered on for a total of 7799 hours.

There was a CRC error detected when attempting to set specific
capabilities on the device.  The error occurred at LBA 0 on the disk,
which is completely bizarre, but the SMART error log might just say LBA
0 to indicate no LBA was being accessed (e.g. the error was purely
during the mode setting attempts).  However, the SMART error wraps its
timestamps at 49.710 days (every 1149.840 hours), so it's going to be
difficult to determine if the below SMART error log entry was from long
ago, or was fairly recent.  Looking at other disks might help, so let's
continue.

Disk ad5 has an excellent temperature, and no sign of bad blocks/sectors
either.  The disk has been powered on for a total of 11956 hours.  No
errors were found in the SMART log.

Disk ad6 has a good temperature, and no sign of bad blocks/sectors.  No
errors were found in the SMART log.

Disk ad7 has an excellent temperature, and no sign of bad blocks/sectors
either.  The disk had been powered on for a total of 12512 hours.

However, much like disk ad4, this disk also witnessed a CRC error when
attempting to either do a DMA read operation or when setting
capabilities on the device.  I'm prone to believe it's when setting
capabilities, because LBA 0 is also seen here, which isn't a likely LBA.
This error happened at the 6310 hour mark, which was about half of its
lifetime ago.

All of this is somewhat of a mystery.  Disk ad4 is on a completely
different physical cable than disk ad7, so that *could* rule out cabling
problems.  The errors seen are only when setting device capabilities
(making an educated guess, but I'm not 100% positive), not 

Re: taskqueue timeout

2008-07-17 Thread Steve Bertrand

Steve Bertrand wrote:

I'm wondering if the problems described in the following link have been 
resolved:


http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html 



I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them 
are experiencing the behavior.


Thanks to all who have provided patches off list. Unfortunately, none of 
them helped.


The only other box I have with four SATA ports on it is my actual 
workstation. The board is ASUS P5GD1, and has an Intel 82801FR SATA 
controller.


I despise the thought that if this works, I'll have to rebuild my 
workstation, but heres to sacrificing my Windows PC in the name of 
ruling out the problem.


In the meantime, can anyone provide any feedback on the board I 
mentioned in regards to FreeBSD?


Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout [SOLVED]

2008-07-17 Thread Steve Bertrand
 Steve Bertrand wrote:

 The only other box I have with four SATA ports on it is my actual
 workstation. The board is ASUS P5GD1, and has an Intel 82801FR SATA
 controller.

I transferred the SATA disks to the above board, loaded up the zpool, and
I can not reproduce the problem :)

Currently, for the last 15 minutes, I'm writing 80MB/s to the zpool with
no problems.

Thanks all,

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


taskqueue timeout

2008-07-15 Thread Steve Bertrand

Hi everyone,

I'm wondering if the problems described in the following link have been 
resolved:


http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html

I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them 
are experiencing the behavior.


The problem only happens with extreme disk activity. The box becomes 
unresponsive (can not SSH etc). Keyboard input is displayed on the 
console, but the commands are not accepted.


Is there anything I can do to either figure this out, or work around it?

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Matthew Dillon

:Hi everyone,
:
:I'm wondering if the problems described in the following link have been 
:resolved:
:
:http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html
:
:I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them 
:are experiencing the behavior.
:
:The problem only happens with extreme disk activity. The box becomes 
:unresponsive (can not SSH etc). Keyboard input is displayed on the 
:console, but the commands are not accepted.
:
:Is there anything I can do to either figure this out, or work around it?
:
:Steve

If you are getting DMA timeouts, go to this URL:

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

Then I would suggest going into /usr/src/sys/dev/ata (I think, on
FreeBSD), locate all instances where request-timeout is set to 5,
and change them all to 10.

cd /usr/src/sys/dev/ata
fgrep 'request-timeout' *.c
... change all assignments of 5 to 10 ...

Try that first.  If it helps then it is a known issue.  Basically
a combination of the on-disk write cache and possible ECC corrections,
remappings, or excessive remapped sectors can cause the drive to take
much longer then normal to complete a request.  The default 5-second
timeout is insufficient.

If it does help, post confirmation to prod the FBsd developers to
change the timeouts.

--

If you are NOT getting DMA timeouts then the ZFS lockups may be due
to buffer/memory deadlocks.  ZFS has knobs for adjusting its memory
footprint size.  Lowering the footprint ought to solve (most of) those
issues.  It's actually somewhat of a hard issue to solve.  Filesystems
like UFS aren't complex enough to require the sort of dynamic memory
allocations deep in the filesystem that ZFS and HAMMER need to do.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Matthew Dillon wrote:


If you are getting DMA timeouts, go to this URL:


Yes, I am.


http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting


I fall under the category of ATA/SATA DMA timeout issues.


Then I would suggest going into /usr/src/sys/dev/ata (I think, on
FreeBSD), locate all instances where request-timeout is set to 5,
and change them all to 10.

cd /usr/src/sys/dev/ata
fgrep 'request-timeout' *.c
... change all assignments of 5 to 10 ...

Try that first.  If it helps then it is a known issue.  Basically
a combination of the on-disk write cache and possible ECC corrections,
remappings, or excessive remapped sectors can cause the drive to take
much longer then normal to complete a request.  The default 5-second
timeout is insufficient.

If it does help, post confirmation to prod the FBsd developers to
change the timeouts.


I've just reproduced the problem, and will try hacking the code now to 
see if the problem goes away.


Since the box won't take input, I can't tell the disk usage at the time 
it dies. However, it seems to appear while running an Amanda backup, and 
my network throughput hits about ~90 Mbps @ ~5 kpps.


I'll post back with results of the increase of the timeout.

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Matthew Dillon wrote:


If you are getting DMA timeouts, go to this URL:

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

Then I would suggest going into /usr/src/sys/dev/ata (I think, on
FreeBSD), locate all instances where request-timeout is set to 5,
and change them all to 10.

cd /usr/src/sys/dev/ata
fgrep 'request-timeout' *.c
... change all assignments of 5 to 10 ...


Changing 5 to 10 in all cases and rebuilding the kernel does not fix the 
problem.


I'm going to install the patch that allows the values to be changed via 
sysctl and up it to 15.


This problem happens across all four disks.

Does anyone else have any suggestions on what I can check?

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Steve Bertrand wrote:

Matthew Dillon wrote:


If you are getting DMA timeouts, go to this URL:

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

Then I would suggest going into /usr/src/sys/dev/ata (I think, on
FreeBSD), locate all instances where request-timeout is set to 5,
and change them all to 10.

cd /usr/src/sys/dev/ata
fgrep 'request-timeout' *.c
... change all assignments of 5 to 10 ...


Changing 5 to 10 in all cases and rebuilding the kernel does not fix the 
problem.


Went from 10-15, and it took quite a bit longer into the backup before 
the problem cropped back up.


Here is what I was seeing at the time it failed. Where netstat and zpool 
iostat drop off is where I start seeing the errors occur:


# top

last pid:  1069;  load averages:  0.09,  0.17,  0.10 


   up 0+00:08:31  19:22:39
53 processes:  1 running, 52 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% 
idle

Mem: 28M Active, 3644K Inact, 301M Wired, 76K Cache, 1634M Free
Swap:


# netstat -w 1 -h

  4.8K 011M   3.5K 0   5.4M 0
  4.5K 010M   3.3K 0   5.1M 0
  4.9K 011M   3.6K 0   5.5M 0
  4.8K 011M   3.5K 0   5.4M 0
  4.3K 0   9.5M   3.1K 0   4.8M 0
  5.1K 011M   3.7K 0   5.7M 0
  5.0K 011M   3.6K 0   5.6M 0
  5.3K 012M   3.9K 0   6.0M 0
  4.8K 011M   3.5K 0   5.4M 0
  4.7K 010M   3.4K 0   5.2M 0
  4.8K 011M   3.5K 0   5.4M 0
  4.6K 010M   3.4K 0   5.2M 0
  4.1K 0   9.1M   3.0K 0   4.6M 0
  5.3K 012M   3.9K 0   6.0M 0
  5.2K 012M   3.8K 0   5.8M 0
  4.3K 0   9.5M   3.1K 0   4.8M 0
  4.3K 0   9.6M   3.2K 0   4.9M 0
  5.4K 012M   4.0K 0   6.1M 0
  4.8K 011M   3.5K 0   5.4M 0
  2.4K 0   5.1M   1.7K 0   2.5M 0
input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
 2 0120  2 0316 0
 3 0180  4 0   1.0K 0
 3 0180  2 0316 0
 3 0180  3 0658 0
 5 0   1.6K  5 0942 0
 3 0254  4 0840 0
 3 0180  2 0316 0


# zpool iostat 1

storage 6.40G  1.81T  0296  0  37.0M
storage 6.43G  1.81T  0188  0  14.5M
storage 6.43G  1.81T  0  0  0  0
storage 6.43G  1.81T  0  0  0  0
storage 6.43G  1.81T  0  0  0  0
storage 6.43G  1.81T  0 47  0  5.99M
storage 6.46G  1.81T  0218  0  18.0M
storage 6.46G  1.81T  0  0  0  0
storage 6.46G  1.81T  0  0  0  0
storage 6.46G  1.81T  9  0   192K  0
storage 6.46G  1.81T  0 59  0  7.39M
storage 6.49G  1.81T  1250  3.42K  14.9M
storage 6.49G  1.81T  0  0  0  0
storage 6.49G  1.81T  0  0  0  0
storage 6.49G  1.81T  0  0  0  0
storage 6.49G  1.81T  0141  0  17.5M
storage 6.52G  1.81T  0 74  0   232K
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0151  0  18.8M
storage 6.52G  1.81T  0114  0  8.07M
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0  0  0  0
storage 6.52G  1.81T  0  0  0  0



Don't know if this will help anyone or not.

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Matthew Dillon
:Went from 10-15, and it took quite a bit longer into the backup before 
:the problem cropped back up.

Try 30 or longer.  See if you can make the problem go away entirely.
then fall back to 5 and see if the problem resumes at its earlier
pace.

--

It could be temperature related.  The drives are being exercised
a lot, they could very well be overheating.  To find out add more
airflow (a big house fan would do the trick).

--

It could be that errors are accumulating on the drives, but it seems
unlikely that four drives would exhibit the same problem.

--

Also make sure the power supply can handle four drives.  Most power
supplies that come with consumer boxes can't under full load if you
also have a mid or high-end graphics card installed.  Power supplies
that come with OEM slap-together enclosures are not usually much better.

Specifically, look at the +5V and +12V amperage maximums on the power
supply, then check the disk labels to see what they draw, then
multiply by 2.  e.g. if your power supply can do [EMAIL PROTECTED] and you 
have
four drives each taking [EMAIL PROTECTED] (and typically ~half that at 5V), 
thats
4x2x2 = [EMAIL PROTECTED] and you would probably be ok.

To test, remove two of the four drives, reformat the ZFS to use just 2,
and see if the problem reoccurs with just two drives.

-Matt

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Alex Trull
Don't want to give conflicting advice, and would suggest you certainly
try the 30 sec thing first. I'm already on 10 myself but haven't pushed
further.

In my own case I've not had any issue with zfs in particular since I
applied the ZFS zil/prefetch disable loader.conf tunables 10 hours ago.
I am observing this now.

For the record ..

What ata chipset/motherboard and model of disk have you got ?
Have you seen any smart errors (real or otherwise) ?
What do your 'zpool status' counters look like ?

--
Alex

On Tue, 2008-07-15 at 12:55 -0700, Matthew Dillon wrote:
 :Went from 10-15, and it took quite a bit longer into the backup before 
 :the problem cropped back up.
 
 Try 30 or longer.  See if you can make the problem go away entirely.
 then fall back to 5 and see if the problem resumes at its earlier
 pace.
 
 --
 
 It could be temperature related.  The drives are being exercised
 a lot, they could very well be overheating.  To find out add more
 airflow (a big house fan would do the trick).
 
 --
 
 It could be that errors are accumulating on the drives, but it seems
 unlikely that four drives would exhibit the same problem.
 
 --
 
 Also make sure the power supply can handle four drives.  Most power
 supplies that come with consumer boxes can't under full load if you
 also have a mid or high-end graphics card installed.  Power supplies
 that come with OEM slap-together enclosures are not usually much better.
 
 Specifically, look at the +5V and +12V amperage maximums on the power
 supply, then check the disk labels to see what they draw, then
 multiply by 2.  e.g. if your power supply can do [EMAIL PROTECTED] and 
 you have
 four drives each taking [EMAIL PROTECTED] (and typically ~half that at 
 5V), thats
 4x2x2 = [EMAIL PROTECTED] and you would probably be ok.
 
 To test, remove two of the four drives, reformat the ZFS to use just 2,
 and see if the problem reoccurs with just two drives.
 
   -Matt
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: taskqueue timeout

2008-07-15 Thread Andrew Snow

Matthew Dillon wrote:

Try that first.  If it helps then it is a known issue.  Basically
a combination of the on-disk write cache and possible ECC corrections,
remappings, or excessive remapped sectors can cause the drive to take
much longer then normal to complete a request.  The default 5-second
timeout is insufficient.


From Western Digital's line of enterprise drives:

RAID-specific time-limited error recovery (TLER) - Pioneered by WD, 
this feature prevents drive fallout caused by the extended hard drive 
error-recovery processes common to desktop drives.



Western Digital's information sheet on TLER states that they found most 
RAID controllers will wait 8 seconds for a disk to respond before 
dropping it from the RAID set.  Consequently they changed their 
enterprise drives to try reading a bad sector for only 7 seconds 
before returning an error.


Therefore I think the FreeBSD timeout should also be set to 8 seconds 
instead of 5 seconds.  Desktop-targetted drives will not respond for 
over 10 seconds, up to minutes, so its not worth setting the FreeBSD 
timeout any higher.



More info:
http://www.wdc.com/en/library/sata/2579-001098.pdf
http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery



- Andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Matthew Dillon wrote:
:Went from 10-15, and it took quite a bit longer into the backup before 
:the problem cropped back up.


Jumping right into it, there is another post after this one, but I'm 
going to try to reply inline:



Try 30 or longer.  See if you can make the problem go away entirely.
then fall back to 5 and see if the problem resumes at its earlier
pace.


I'm sure 30 will either push the issue longer, or into non-existence, 
but are there any developers here who can say what this timer does? ie. 
How does changing this timer affect the performance of the disk 
subsystem (aside from allowing it to work, of course).


After I'm done responding this message, I'll be testing the sysctl to 30.

   It could be temperature related.  The drives are being exercised

a lot, they could very well be overheating.  To find out add more
airflow (a big house fan would do the trick).



Temperature is a good thought, but currently, my physical situation has 
this:


- 2U chassis
- multiple fans in the case
- in my lab (which is essentially beside my desk)
- the case has no lid
- it is 64 degrees with A/C and circulating fans in this area
- hard drives are separated relatively well inside the case


It could be that errors are accumulating on the drives, but it seems
unlikely that four drives would exhibit the same problem.


Thats what I'm thinking. All four drives are exhibiting the same 
errors... or, for all intents and purposes, the machine is coughing the 
same errors for all the drives.



Also make sure the power supply can handle four drives.  Most power
supplies that come with consumer boxes can't under full load if you
also have a mid or high-end graphics card installed.  Power supplies
that come with OEM slap-together enclosures are not usually much better.


I currently have a 550W PSU in the 2U chassis, which again, is sitting 
open. I have more hardware, running in worse conditions with less 
wattage PSUs that don't exhibit this behavior. I need to determine 
whether this problem is SATA, ZFS, the motherboard or code.



Specifically, look at the +5V and +12V amperage maximums on the power
supply, then check the disk labels to see what they draw, then
multiply by 2.  e.g. if your power supply can do [EMAIL PROTECTED] and you 
have
four drives each taking [EMAIL PROTECTED] (and typically ~half that at 5V), 
thats
4x2x2 = [EMAIL PROTECTED] and you would probably be ok.


I'm well within specs. Even after V/A tests with the meter. The power 
supply is providing ample wattage to each device accordingly.



To test, remove two of the four drives, reformat the ZFS to use just 2,
and see if the problem reoccurs with just two drives.


... I knew that was going to come up... my response is I worked so hard 
to get this system with ZFS all configured *exactly* how I wanted it.


To test, I'm going to flip to 30 as per Matthews recommendation, and see 
how far that takes me. At this time, I'm only testing by backing up one 
machine on the network. If it fails, I'll clock the time, and then 
'reformat' with two drives.


Is there a technical reason this may work better with only two drives?

Is there anyone interested to the point where remote login would be helpful?

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Jeremy Chadwick
On Tue, Jul 15, 2008 at 10:29:28PM -0400, Steve Bertrand wrote:
 Is there anyone interested to the point where remote login would be helpful?

I believe my FreeBSD Wiki page documents what to do if your problem
is easily reproducable: contact Scott Long, who has offered to help
track down the source of these problems.

I'll reply to the other part of your mail in a bit.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Alex Trull wrote:

Don't want to give conflicting advice, and would suggest you certainly
try the 30 sec thing first. I'm already on 10 myself but haven't pushed
further.


What were you doing, and what did you notice when the problem started?

As much as it seems silly, I'm mostly interested in what your network 
was doing at the time things went sour.



In my own case I've not had any issue with zfs in particular since I
applied the ZFS zil/prefetch disable loader.conf tunables 10 hours ago.
I am observing this now.


For some reason, and with no explanation or science behind it, I don't 
think this is a ZFS problem, and I'm trying to defend this thought to my 
peers until I prove otherwise.


I have to be a bit careful on how I adjust loader properties, given that 
I'm loading from USB, and mounting root from a ZFS zpool hard disk. Like 
my GELI systems, tweaking things can be a bit touchy unless I put a 
little more planning into it.



For the record ..

What ata chipset/motherboard and model of disk have you got ?


I'm not a hardware person per-se, but I'm advised to post that the 
motherboard is:


- XFS nForce 610i with GeForce 7050

If there is more hardware info I can provide, let me know specifically 
what I should be looking for.



Have you seen any smart errors (real or otherwise) ?
What do your 'zpool status' counters look like ?


zpool status is always clean. There are no errors otherwise, even if the 
box is up for multiple hours straight. The problem occurs only if I 
through work at it.


Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Jeremy Chadwick wrote:

On Tue, Jul 15, 2008 at 10:29:28PM -0400, Steve Bertrand wrote:

Is there anyone interested to the point where remote login would be helpful?


I believe my FreeBSD Wiki page documents what to do if your problem
is easily reproducable: contact Scott Long, who has offered to help
track down the source of these problems.


Changing to 30 second timeout made no difference whatsoever. The problem 
occurred at about the same time during the single


I'm at a standstill.

I'm willing to help provide any information necessary to fix this issue, 
or provide remote access to the box in question.


scottl@ has been Cc:'d.

Thanks all,

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Matthew Dillon

:...
: and see if the problem reoccurs with just two drives.
:
:... I knew that was going to come up... my response is I worked so hard 
:to get this system with ZFS all configured *exactly* how I wanted it.
:
:To test, I'm going to flip to 30 as per Matthews recommendation, and see 
:how far that takes me. At this time, I'm only testing by backing up one 
:machine on the network. If it fails, I'll clock the time, and then 
:'reformat' with two drives.
:
:Is there a technical reason this may work better with only two drives?
:
:Is there anyone interested to the point where remote login would be helpful?
:
:Steve

This issue is vexing a lot of people.

Setting the timeout to 30 will not effect performance, but it will
cause a 30 second delay in recovery when (if) the problem occurs.
i.e. when the disk stalls it will just sit there doing nothing for
30 seconds, then it will print the timeout message and try to recover.

It occurs to me that it might be beneficial to actually measure the
disk's response time to each request, and then graph it over a period
of time.  Maybe seeing the issue visually will give some clue as to the
actual cause.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Andrew Snow wrote:


 From Western Digital's line of enterprise drives:

RAID-specific time-limited error recovery (TLER) - Pioneered by WD, 
this feature prevents drive fallout caused by the extended hard drive 
error-recovery processes common to desktop drives.


Therefore I think the FreeBSD timeout should also be set to 8 seconds 
instead of 5 seconds.  Desktop-targetted drives will not respond for 
over 10 seconds, up to minutes, so its not worth setting the FreeBSD 
timeout any higher.


Interesting you say this. To reiterate, I have /boot on USB thumb drive, 
and the system is mounted from / on a raidz pool called /storage via 
loader.conf.


The four drives in question (per the packaging) are:

- Western Digital Caviar SE16 500GB
- 7200, 16MB, SATA-300, OEM

Per the packaging on the rest of the hardware:

# mobo
- XFX 610i, 7050 GeForce (I *never* use graphics on my FreeBSD boxen, I 
*only* know/have CLI with no 'windows')


# memory
- 2 GB Corsair XMS2 Twin2X 6400C4 memory

# cpu
- Intel Pentium DC E2200 2.20GHz OEM
- 2.20 GHz, 1MB Cache, 800MHz FSB, Allendale, Dual Core, OEM, Socket 
775, Processor


# swap
- I don't run any, but can/will add in an IDE/ATA 7200 200GB in the 
event this problem may be related to ZFS/RAM issues.


Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: taskqueue timeout

2008-07-15 Thread Steve Bertrand

Matthew Dillon wrote:


This issue is vexing a lot of people.


Heh... I can appreciate this. I would like someone to inform me that 
this can't be guaranteed to be a ZFS problem... if I can get 
confirmation that others have this issue aside from ZFS, I would feel 
content.



Setting the timeout to 30 will not effect performance, but it will
cause a 30 second delay in recovery when (if) the problem occurs.
i.e. when the disk stalls it will just sit there doing nothing for
30 seconds, then it will print the timeout message and try to recover.


If I have the timeout at = 30 and the issue still occurs, the problem 
must be elsewhere.



It occurs to me that it might be beneficial to actually measure the
disk's response time to each request, and then graph it over a period
of time.  Maybe seeing the issue visually will give some clue as to the
actual cause.


I am interested in following through with this, but can't do it on my 
own. I'm willing to dedicate the box and bandwidth to anyone who can 
legitimately test this as you state. ie: I need either guidance or 
assistance.


This box is ready for the taking. Beyond this box, I can provide 
legitimate parties other network resources to produce a consistent flow 
of data to ensure the ability to easily reproduce the issue locally, on 
demand.


Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-12 Thread Jeremy Chadwick
On Mon, May 12, 2008 at 02:42:40PM +0900, dikshie wrote:
 i just phone computer store and check together the BIOS.
 it seems my computer store put the disk on IDE mode NOT on AHCI mode.
 after change to AHCI now FreeBSD can detect SATA 300 (for WDC) and
 SATA 150 (for DVD-R).

Great, there you go.  :-)

 BUT sometimes I still getting:
 ad4: warning - setfeatures set transfer mode taskqueue timeout -completing 
 request directly
 ad4: warning - setfeatures enable rcache taskqueue timeout - completing 
 request directly
 ad4: warning - setfeatures enable wcache taskqueue timeout - completing 
 request directly
 ad4: timeout - flushcache retrying (0 retries left)
 geom_journal: flush cache of ad4s1d: error=5
 ad4: timeout - write_dma retrying (1 retry left) LBA=4139103

1) Have you checked SMART statistics of the drive, or run SMART tests?
Install ports/sysutils/smartmontools and use smartctl -a /dev/ad4, and
provide the output.

2) Is the error always on ad4?  If so, is the error always at LBA
4139103, or around there (give or take a few thousand addressing
blocks)?  If so, the ad4 disk may be going bad.  Otherwise, I would say
this is probably the issue I've documented on my Common Issues page.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-12 Thread dikshie
On Mon, May 12, 2008 at 3:00 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 1) Have you checked SMART statistics of the drive, or run SMART tests?
 Install ports/sysutils/smartmontools and use smartctl -a /dev/ad4, and
 provide the output.

 2) Is the error always on ad4?  If so, is the error always at LBA
 4139103, or around there (give or take a few thousand addressing
 blocks)?  If so, the ad4 disk may be going bad.  Otherwise, I would say
 this is probably the issue I've documented on my Common Issues page.

dhcp-143-221# smartctl -a /dev/ad4
smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Second Generation Serial ATA family
Device Model: WDC WD3200AAKS-00VYA0
Serial Number:WD-WCARW2314765
Firmware Version: 12.01B02
User Capacity:320,072,933,376 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Mon May 12 15:27:24 2008 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
was suspended by an
interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (8760) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 104) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   200   200   051Pre-fail
Always   -   0
  3 Spin_Up_Time0x0003   155   155   021Pre-fail
Always   -   5233
  4 Start_Stop_Count0x0032   100   100   000Old_age
Always   -   11
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail
Always   -   0
  7 Seek_Error_Rate 0x000e   200   200   051Old_age
Always   -   0
  9 Power_On_Hours  0x0032   099   099   000Old_age
Always   -   940
 10 Spin_Retry_Count0x0012   100   253   051Old_age
Always   -   0
 11 Calibration_Retry_Count 0x0012   100   253   051Old_age
Always   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
Always   -   11
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age
Always   -   8
193 Load_Cycle_Count0x0032   200   200   000Old_age
Always   -   11
194 Temperature_Celsius 0x0022   114   108   000Old_age
Always   -   33
196 Reallocated_Event_Count 0x0032   200   200   000Old_age
Always   -   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age
Always   -   0
198 Offline_Uncorrectable   0x0010   200   200   000Old_age
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age
Always   -   0
200 Multi_Zone_Error_Rate   0x0008   200   200   051Old_age
Offline  -   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure 

any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-11 Thread dikshie
Hi,
I got :

ad4: warning - setfeatures set transfer mode taskqueue timeout -
completing request directly
ad4: warning - setfeatures enable rcache taskqueue timeout -
completing request directly
ad4: warning - setfeatures enable wcache taskqueue timeout -
completing request directly
ad4: timeout - flushcache retrying (0 retries left)
geom_journal: flush cache of ad4s1d: error=5
ad4: timeout - write_dma retrying (1 retry left) LBA=4139103
---
I read: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues

and strange output from dmesg:
ad4: 305245MB WDC WD3200AAKS-00VYA0 12.01B02 at ata2-master UDMA33

  ^^^
any available patches ?


best regards,

-dikshie-

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-STABLE #19: Sat May 10 15:41:00 JST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/BARU
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU E6550  @ 2.33GHz (2333.34-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x6fb  Stepping = 11
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe3fdSSE3,RSVD2,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  Cores per package: 2
usable memory = 2000646144 (1907 MB)
avail memory  = 1926553600 (1837 MB)
ACPI APIC Table: Nvidia NVDAACPI
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ACPI Warning (tbfadt-0505): Optional field Pm2ControlBlock has zero
address or length:0   0/1 [20070320]
ioapic0: Changing APIC ID to 4
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
cryptosoft0: software crypto on motherboard
acpi0: Nvidia NVDAACPI on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 10, 7fdf (3) failed
acpi0: reservation of 0, a (3) failed
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfeff-0xfeff03ff on acpi0
Timecounter HPET frequency 2500 Hz quality 900
cpu0: ACPI CPU on acpi0
est0: Enhanced SpeedStep Frequency Control on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 72a072a0600072a
device_attach: est0 attach returned 6
p4tcc0: CPU Frequency Thermal Control on cpu0
cpu1: ACPI CPU on acpi0
est1: Enhanced SpeedStep Frequency Control on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 72a072a0600072a
device_attach: est1 attach returned 6
p4tcc1: CPU Frequency Thermal Control on cpu1
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: memory, RAM at device 0.1 (no driver attached)
pci0: memory, RAM at device 1.0 (no driver attached)
pci0: memory, RAM at device 1.1 (no driver attached)
pci0: memory, RAM at device 1.2 (no driver attached)
pci0: memory, RAM at device 1.3 (no driver attached)
pci0: memory, RAM at device 1.4 (no driver attached)
pci0: memory, RAM at device 1.5 (no driver attached)
pci0: memory, RAM at device 1.6 (no driver attached)
pci0: memory, RAM at device 2.0 (no driver attached)
isab0: PCI-ISA bridge at device 3.0 on pci0
isa0: ISA bus on isab0
pci0: serial bus, SMBus at device 3.1 (no driver attached)
pci0: memory, RAM at device 3.2 (no driver attached)
pci0: memory, RAM at device 3.4 (no driver attached)
ohci0: OHCI (generic) USB controller mem 0xe000-0xefff irq
21 at device 4.0 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: OHCI (generic) USB controller on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 on usb0
uhub0: 10 ports with 10 removable, self powered
ehci0: EHCI (generic) USB 2.0 controller mem 0xefffe000-0xefffe0ff
irq 22 at device 4.1 on pci0
ehci0: [GIANT-LOCKED]
ehci0: [ITHREAD]
usb1: EHCI version 1.0
usb1: companion controller, 10 ports each: usb0
usb1: EHCI (generic) USB 2.0 controller on ehci0
usb1: USB revision 2.0
uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 on usb1
uhub1: 10 ports with 10 removable, self powered
umass0: Generic USB2.0-CRW, class 0/0, rev 2.00/11.22, addr 2 on uhub1
atapci0: nVidia nForce MCP73 UDMA133 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 8.0 on
pci0
ata0: ATA channel 0 on atapci0
ata0: [ITHREAD]
ata1: ATA channel 1 on atapci0
ata1: [ITHREAD]
pci0: multimedia

Re: any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-11 Thread Jeremy Chadwick
On Mon, May 12, 2008 at 01:17:16PM +0900, dikshie wrote:
 and strange output from dmesg:
 ad4: 305245MB WDC WD3200AAKS-00VYA0 12.01B02 at ata2-master UDMA33
   ^^^
 any available patches ?

What's strange about this?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-11 Thread Andrey V. Elsukov

dikshie wrote:

atapci1: nVidia ATA controller port
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xf700-0xf70f mem
0xefff8000-0xefff9fff irq 20 at device 14.0 on pci0


It seems your controller detected as generic ATA.
Can you show `pciconf -l` output from your system?

--
WBR, Andrey V. Elsukov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: any available patches for sata: ad4: warning - setfeatures set transfer mode taskqueue timeout - completing request directly

2008-05-11 Thread Jeremy Chadwick
On Mon, May 12, 2008 at 02:14:37PM +0900, dikshie wrote:
 On Mon, May 12, 2008 at 2:11 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
  On Mon, May 12, 2008 at 01:17:16PM +0900, dikshie wrote:
  and strange output from dmesg:
  ad4: 305245MB WDC WD3200AAKS-00VYA0 12.01B02 at ata2-master UDMA33
^^^
  any available patches ?
 
  What's strange about this?
 
 i mean for UDMA33 it should be SATA300
 do have to upgrade BIOS?

Your carets are pointing to the drive firmware revision, which is why I
was confused.  :-) Yes, it should say either SATA150 or SATA300 (more on
that in a moment), but based on your dmesg output, it appears your SATA
controller does not have an attached driver, thus is operating
generically.  Andrey recommended showing pciconf -lv output; please do.

Your drive is *probably* operating in SATA150/300 mode, despite UDMA33
being printed, however.  Assuming you know your disk can push 33MB/sec,
you could try some simple read I/O and use gstat to watch the speed.
dd if=/dev/ad4 of=/dev/null bs=1m should suffice.

Regarding SATA150 vs. SATA300: your drive has a physical jumper,
labelled OPT1 in the below photo, which limits the drive to SATA150
capability.  You can remove this jumper and get SATA300.  This doesn't
explain the UDMA33 issue, though.

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1400p_created=1134597011

Also, please don't remove the mailing list from the CC; others need to
know what information you've provided, and future users may find this
thread useful.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Promise PDC20378 - SETFEATURES SET TRANSFER MODE taskqueue timeout

2007-10-21 Thread Jeff Doolittle

Everyone,

I just recently updated my primary server to the latest FreeBSD RELENG_6 
release last weekend and have started receiving the following errors 
every day requiring me to power off the computer (the console is hung 
and Ctrl-Alt-Del don't work):



Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SET_MULTI taskqueue 
timeout - completing request directly
Oct 18 23:15:02 saturn kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=1129375
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SET_MULTI taskqueue 
timeout - completing request directly
Oct 18 23:15:02 saturn kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=1129375



For some strange reason the above error last night didn't cause the 
typical hang, but this evening it happend again @ ~9:30pm (17:30) with a 
hard failure resulting in a power-cycle to get the server running again.


I originally thought one of the drives was going bad so I replaced the 
existing 200gb Maxstor PATA with two 500gb WD SATA.  Therefore, that 
rules out cables and drives and returns me to the motherboard (Promise 
Controller) or FreeBSD.


The following is the output from a pciconf -lv


[EMAIL PROTECTED]:0:0:  class=0x06 card=0x80f61043 chip=0x25788086 rev=0x02 
hdr=0x00

   vendor = 'Intel Corporation'
   device = '82875P/E7210 DRAM Controller / Host-Hub Interface'
   class  = bridge
   subclass   = HOST-PCI
[EMAIL PROTECTED]:1:0: class=0x060400 card=0x chip=0x25798086 rev=0x02 
hdr=0x01

   vendor = 'Intel Corporation'
   device = '82875P PCI-to-AGP Bridge'
   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x80a61043 chip=0x24d28086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) USB UHCI Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x80a61043 chip=0x24d48086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) USB UHCI Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:29:2:class=0x0c0300 card=0x80a61043 chip=0x24d78086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) USB UHCI Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:29:3:class=0x0c0300 card=0x80a61043 chip=0x24de8086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) USB UHCI Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:29:7:class=0x0c0320 card=0x80a61043 chip=0x24dd8086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) USB 2.0 EHCI Controller'
   class  = serial bus
   subclass   = USB
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 
rev=0xc2 hdr=0x01

   vendor = 'Intel Corporation'
   device = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB 
Hub Interface to PCI Bridge'

   class  = bridge
   subclass   = PCI-PCI
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x24d08086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) LPC Interface Bridge'
   class  = bridge
   subclass   = PCI-ISA
[EMAIL PROTECTED]:31:1:  class=0x01018a card=0x80a61043 chip=0x24db8086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) EIDE Controller'
   class  = mass storage
   subclass   = ATA
[EMAIL PROTECTED]:31:3:class=0x0c0500 card=0x80a61043 chip=0x24d38086 
rev=0x02 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801EB/ER (ICH5/ICH5R) SMBus Controller'
   class  = serial bus
   subclass   = SMBus
[EMAIL PROTECTED]:31:5:class=0x040100 card=0x80f31043 chip=0x24d58086 
rev=0x02 hdr

Re: Promise PDC20378 - SETFEATURES SET TRANSFER MODE taskqueue timeout

2007-10-21 Thread Miroslav Lachman

Jeff Doolittle wrote:

Everyone,

I just recently updated my primary server to the latest FreeBSD RELENG_6 
release last weekend and have started receiving the following errors 
every day requiring me to power off the computer (the console is hung 
and Ctrl-Alt-Del don't work):



Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad6: WARNING - SET_MULTI taskqueue 
timeout - completing request directly
Oct 18 23:15:02 saturn kernel: ad6: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=1129375
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request

directly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES ENABLE RCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SETFEATURES ENABLE WCACHE 
taskqueue timeout - completing request dire

ctly
Oct 18 23:15:02 saturn kernel: ad4: WARNING - SET_MULTI taskqueue 
timeout - completing request directly
Oct 18 23:15:02 saturn kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 
retry left) LBA=1129375



[...]

I had same problem many times and only mainboard replacement solves the 
problem.
Last time I saw these errors (1 week ago) it was in dying Asus RS-120 
which was running 6.2-RELEASE for about 6 month. So the problem is not 
related to 6.2-RELEASE, but to hardware.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - on FreeBSD 6-STABLE

2006-03-24 Thread Sam Stein

Have you tried it with a livecd or something?
+++ Peter van Heusden [freebsd] [24/03/06 09:51 +0200]:

Hi

After my previous email about the SETFEATURES SET TRANSFER MODE timeout 
on (msgid [EMAIL PROTECTED] , 17 March 14:18 GMT + 2 on 
freebsd-stable), I installed FreeBSD 6.1 BETA 4 and upgraded to a 
6-STABLE kernel, running the box in 'safe' mode to do so. I now, 
however, get a slightly different error message:


ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing 
request directly

ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: FAILURE - WRITE_DMA timed out LBA=32804495

(The address after LBA is not always the same)

This is with ad4 as a Seagate ST320423A on a Promise PDC20262 UDMA66 
controller.


Any suggestions?

Thanks,
Peter


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


--

b1tt3r -- You know, like sugar?
Sam Stein
Computer TeXnician/Programmer


pgphwwtCAfkE4.pgp
Description: PGP signature


ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - on FreeBSD 6-STABLE

2006-03-23 Thread Peter van Heusden

Hi

After my previous email about the SETFEATURES SET TRANSFER MODE timeout 
on (msgid [EMAIL PROTECTED] , 17 March 14:18 GMT + 2 on 
freebsd-stable), I installed FreeBSD 6.1 BETA 4 and upgraded to a 
6-STABLE kernel, running the box in 'safe' mode to do so. I now, 
however, get a slightly different error message:


ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - 
completing request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing 
request directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing 
request directly

ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: FAILURE - WRITE_DMA timed out LBA=32804495

(The address after LBA is not always the same)

This is with ad4 as a Seagate ST320423A on a Promise PDC20262 UDMA66 
controller.


Any suggestions?

Thanks,
Peter


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]