Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
Hi there, I'm scratching my head. I've just migrated to a super micro chassis and at the same time gone from FreeBSD 9.0 to 9.1-RELEASE. The machine in question is running a ZFS mirror configuration on two ada devices (with a 8gb gmirror carved out for swap). Since doing so I've been having strange drop outs on the drives; the just disappear from the bus like so: (ada2:ahcich2:0:0:0): removing device entry (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted At first I though it was a failing drive - one of the drives did this, and I limped on a single drive for a week until I could get someone up to the rack to plug a third drive in. We resilvered the zpool onto the new device and ran with the failed drive still plugged in (but not responding to a reset on the ada bus with camcontrol) for a week or so. Then, the new drive dropped out in exactly the same way, followed in short order by the remaining original drive!!! After rebooting the machine, and observing all three drives probing and available, I resilvered the gmirror and zpool again on the two devices expected that I thought were reliable, but before the resilvering was completed the new drive dropped out again. I'm scratching my head now. I can't imagine that it's a wiring problem, as they are all on individual SATA buses and individually cabled. Smart isn't reporting an drive issues either…. :/ So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade to 9-RELENG would I expect that to resolve the problem? (Have there been any reported ada bus issuer reported since last December?) The hardware in question is: ahci0: Intel Cougar Point AHCI SATA controller port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: AHCI channel at channel 0 on ahci0 ahcich1: AHCI channel at channel 1 on ahci0 ahcich2: AHCI channel at channel 2 on ahci0 ahcich3: AHCI channel at channel 3 on ahci0 ahcich4: AHCI channel at channel 4 on ahci0 ahcich5: AHCI channel at channel 5 on ahci0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad8 Any ideas would be greatly welcomed. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
What chassis is this? - Original Message - From: Dr Josef Karthauser j...@karthauser.co.uk To: freebsd...@freebsd.org Cc: freebsd-stable@freebsd.org Sent: Thursday, July 18, 2013 8:29 AM Subject: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue? Hi there, I'm scratching my head. I've just migrated to a super micro chassis and at the same time gone from FreeBSD 9.0 to 9.1-RELEASE. The machine in question is running a ZFS mirror configuration on two ada devices (with a 8gb gmirror carved out for swap). Since doing so I've been having strange drop outs on the drives; the just disappear from the bus like so: (ada2:ahcich2:0:0:0): removing device entry (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted At first I though it was a failing drive - one of the drives did this, and I limped on a single drive for a week until I could get someone up to the rack to plug a third drive in. We resilvered the zpool onto the new device and ran with the failed drive still plugged in (but not responding to a reset on the ada bus with camcontrol) for a week or so. Then, the new drive dropped out in exactly the same way, followed in short order by the remaining original drive!!! After rebooting the machine, and observing all three drives probing and available, I resilvered the gmirror and zpool again on the two devices expected that I thought were reliable, but before the resilvering was completed the new drive dropped out again. I'm scratching my head now. I can't imagine that it's a wiring problem, as they are all on individual SATA buses and individually cabled. Smart isn't reporting an drive issues either…. :/ So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade to 9-RELENG would I expect that to resolve the problem? (Have there been any reported ada bus issuer reported since last December?) The hardware in question is: ahci0: Intel Cougar Point AHCI SATA controller port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: AHCI channel at channel 0 on ahci0 ahcich1: AHCI channel at channel 1 on ahci0 ahcich2: AHCI channel at channel 2 on ahci0 ahcich3: AHCI channel at channel 3 on ahci0 ahcich4: AHCI channel at channel 4 on ahci0 ahcich5: AHCI channel at channel 5 on ahci0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad8 Any ideas would be greatly welcomed. Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
Hi, On 18 Jul 2013, at 08:29, Dr Josef Karthauser wrote: Hi there, I'm scratching my head. I've just migrated to a super micro chassis and at the same time gone from FreeBSD 9.0 to 9.1-RELEASE. The machine in question is running a ZFS mirror configuration on two ada devices (with a 8gb gmirror carved out for swap). Since doing so I've been having strange drop outs on the drives; the just disappear from the bus like so: (ada2:ahcich2:0:0:0): removing device entry (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted (aprobe0:ahcich2:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 (aprobe0:ahcich2:0:0:0): CAM status: ATA Status Error (aprobe0:ahcich2:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) (aprobe0:ahcich2:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted At first I though it was a failing drive - one of the drives did this, and I limped on a single drive for a week until I could get someone up to the rack to plug a third drive in. We resilvered the zpool onto the new device and ran with the failed drive still plugged in (but not responding to a reset on the ada bus with camcontrol) for a week or so. Then, the new drive dropped out in exactly the same way, followed in short order by the remaining original drive!!! After rebooting the machine, and observing all three drives probing and available, I resilvered the gmirror and zpool again on the two devices expected that I thought were reliable, but before the resilvering was completed the new drive dropped out again. I'm scratching my head now. I can't imagine that it's a wiring problem, as they are all on individual SATA buses and individually cabled. Smart isn't reporting an drive issues either…. :/ So, I'm wondering, is it a driver issuer with 9.1-RELEASE, if I upgrade to 9-RELENG would I expect that to resolve the problem? (Have there been any reported ada bus issuer reported since last December?) The hardware in question is: ahci0: Intel Cougar Point AHCI SATA controller port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: AHCI channel at channel 0 on ahci0 ahcich1: AHCI channel at channel 1 on ahci0 ahcich2: AHCI channel at channel 2 on ahci0 ahcich3: AHCI channel at channel 3 on ahci0 ahcich4: AHCI channel at channel 4 on ahci0 ahcich5: AHCI channel at channel 5 on ahci0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: WDC WD1000FYPS-01ZKB0 02.01B01 ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad8 Any ideas would be greatly welcomed. Thanks, Joe Me too (over a long period, with various hardware). There is a general problem with energy-saving drives that controllers don't understand them. Typically the drive decides to go into some power-saving mode, the controller wants to do some operation, the drive takes too long to come ready, the controller decides the drive has gone away. You have to persuade the controller to wait longer for the drive to come ready, and/or persuade the drive to stay awake. This isn't necessarily easy, eg the controller's ready wait may not be programmable. (Or avoid such drives like the plague, life's too short). -- Bob Bishop r...@gid.co.uk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 07/18/13 10:25, Bob Bishop wrote: Me too (over a long period, with various hardware). There is a general problem with energy-saving drives that controllers don't understand them. Typically the drive decides to go into some power-saving mode, the controller wants to do some operation, the drive takes too long to come ready, the controller decides the drive has gone away. You have to persuade the controller to wait longer for the drive to come ready, and/or persuade the drive to stay awake. This isn't necessarily easy, eg the controller's ready wait may not be programmable. (Or avoid such drives like the plague, life's too short). Perhaps they are WD Green drives? In that case, other than quoting Bob's suggestion about avoiding them, there's something you can do: a) turn off the drives' power-saving features (this is done through a DOS utility you can download); b) try different controllers and/or different OS releases. You'll find a lot on this problem if you search the web. There's also a report of mine you can search on this ML, regarding FreeBSD specifically. HTH. bye av. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote: Perhaps they are WD Green drives? They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green or not. In that case, other than quoting Bob's suggestion about avoiding them, there's something you can do: a) turn off the drives' power-saving features (this is done through a DOS utility you can download); b) try different controllers and/or different OS releases. I'm committed to FreeBSD, as the machine is already rolled out and in a data centre ;). You'll find a lot on this problem if you search the web. There's also a report of mine you can search on this ML, regarding FreeBSD specifically. I'll see if I can find it. Thanks. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 18 Jul 2013, at 08:33, Steven Hartland kill...@multiplay.co.uk wrote: What chassis is this? Hey Steven, It's a Supermicro CSE-813MTQ-350CB. Cheers, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
Hi-- On Jul 18, 2013, at 12:13 PM, Dr Josef Karthauser j...@karthauser.co.uk wrote: On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote: Perhaps they are WD Green drives? They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green or not. Yes, those are WDC's Green drives, although they are also the higher grade version as compared to standard desktop drives which are supposed to have firmware which plays nice with RAID (TLER, time-limited error recovery). Updating the firmware and increasing the timeout before these spin down automagically is likely to help, but as Andrea noted, such drives do have quite a history of timeout problems due to excessive head parking and their power conservation attempts. Regards, -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 18 Jul 2013, at 20:31, Charles Swiger cswi...@mac.com wrote: Hi-- On Jul 18, 2013, at 12:13 PM, Dr Josef Karthauser j...@karthauser.co.uk wrote: On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote: Perhaps they are WD Green drives? They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green or not. Yes, those are WDC's Green drives, although they are also the higher grade version as compared to standard desktop drives which are supposed to have firmware which plays nice with RAID (TLER, time-limited error recovery). Updating the firmware and increasing the timeout before these spin down automagically is likely to help, but as Andrea noted, such drives do have quite a history of timeout problems due to excessive head parking and their power conservation attempts. We also wondered whether it was the motherboard, and so we've replaced it! Hope that that works! But, from what's being said here, it looks like that might not be the case. :/ Although, we've been up for 5 days now with no recurrences of the previous issue. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 18 Jul 2013, at 20:31, Charles Swiger cswi...@mac.com wrote: On Jul 18, 2013, at 12:13 PM, Dr Josef Karthauser j...@karthauser.co.uk wrote: On 18 Jul 2013, at 13:07, Andrea Venturoli m...@netfence.it wrote: Perhaps they are WD Green drives? They're WD RE2-GP 1 TB drives (model WD1000FYPS) , not sure if that's green or not. Yes, those are WDC's Green drives, although they are also the higher grade version as compared to standard desktop drives which are supposed to have firmware which plays nice with RAID (TLER, time-limited error recovery). Updating the firmware and increasing the timeout before these spin down automagically is likely to help, but as Andrea noted, such drives do have quite a history of timeout problems due to excessive head parking and their power conservation attempts. They're currently on firmware 02.01B01, btw. Not sure if that's the latest or not. Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
- Original Message - From: Dr Josef Karthauser j...@karthauser.co.uk On 18 Jul 2013, at 08:33, Steven Hartland kill...@multiplay.co.uk wrote: What chassis is this? Hey Steven, It's a Supermicro CSE-813MTQ-350CB. We've seen issues on supermicro chassis before which cause timeouts and in extreme cases device drops so if you can try wiring the disks up directly to the MB via sata cables bypassing the hotswap midplane and see if that helps. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 07/18/13 21:13, Dr Josef Karthauser wrote: b) try different controllers and/or different OS releases. I'm committed to FreeBSD, as the machine is already rolled out and in a data centre ;). I said different OS releases, not different OS! I wouln't say such a blasphemy :) bye av. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Drive failures with ada on FreeBSD-9.1, driver bug or wiring issue?
On 07/18/13 21:31, Charles Swiger wrote: Updating the firmware and increasing the timeout before these spin down automagically is likely to help, but as Andrea noted, such drives do have quite a history of timeout problems due to excessive head parking and their power conservation attempts. Just for the record, I've been using them for several months without a hitch; it's just a matter of finding the correct settings/firmware/OS version/controller. This is to say you should be able to get them to work, altough you might require some luck (or some sort of divination). bye av. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org