Re: Drive Disconnection

2008-10-24 Thread Jeremy Chadwick
On Fri, Oct 24, 2008 at 02:02:41PM -0400, Mark Jacobs wrote:
 I have an external Lacie 1Tb drive attached to a FreeBSD 6.4-PRERELEASE
 system via an ESATA connection.
 
 atapci0: SiI SiI 3512 SATA150 controller
 
 I cleaned off the drive by writing random data to it. The write took
 overnight and didn't experience any problems. I then added a filesystem
 to the drive and mounted it on the system.
 
 However when I perform an rsync backup from a FreeBSD 7.1 PRERELEASE
 system to the drive over an NFS connection the drive disconnects and the
 server reboots.

You've not provided enough information to help track this down.  What
model/brand of disk is attached to that controller?  What does smartctl
-a have to say about the disk?  What gets printed on the console before
it reboots?  Do you have the same problem if you run
7.1-PRERELEASE/BETA2?

 Does anyone have an idea where to go from here?

The only generic advice I can give you at this point) is to avoid
Silicon Image controllers, particularly their SATA controllers.  They
have a history of causing data corruption on Linux, FreeBSD, and
Windows, and some have reported other miscellaneous problems with them
as well.  There's not enough evidence in this thread so far to blame the
SiI controller, but when I see them, I become immediately suspicious.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Drive Disconnection

2008-10-24 Thread mark.jacobs
It is a Lacie d2 quadra drive but FreeBSD reports this;
server kernel: ad4: 953869MB Hitachi HDS721010KLA330 GKAOA70M at ata2-master 
SATA150

When I perform the RSYNC I receive these errors

Oct 24 12:47:13 server kernel: ad4: FAILURE - device detached
Oct 24 12:47:13 server kernel: subdisk4: detached
Oct 24 12:47:13 server kernel: ad4: detached
Oct 24 12:47:13 server kernel: g_vfs_done():ad4s1a[WRITE(offset=144332767232, 
length=131072)]error = 6
Oct 24 12:47:13 server kernel: g_vfs_done():ad4s1a[WRITE(offset=144332898304, 
length=131072)]error = 6
The write failure messages keep on being issued until the server reboots. It 
isn't in the log, but I receive a dirty buffer panic.

I don't have easy access to a 7.1 system with an ESATA port.

I'm current redoing the entire process, wipe, build filesystem, mount, rsync 
using the USB port. If that works I'm going to junk the idea of using the ESATA 
card for the drive.

Can you recommend an ESATA card that fits in an PCI slot since my server 
doesn't have a PCI-E slot?

Mark Jacobs

-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED]
Sent: Fri 10/24/2008 7:09 PM
To: Jacobs, Mark - Data Center Operations [EMAIL PROTECTED]
Cc: freebsd-questions@freebsd.org
Subject: Re: Drive Disconnection
 
On Fri, Oct 24, 2008 at 02:02:41PM -0400, Mark Jacobs wrote:
 I have an external Lacie 1Tb drive attached to a FreeBSD 6.4-PRERELEASE
 system via an ESATA connection.
 
 atapci0: SiI SiI 3512 SATA150 controller
 
 I cleaned off the drive by writing random data to it. The write took
 overnight and didn't experience any problems. I then added a filesystem
 to the drive and mounted it on the system.
 
 However when I perform an rsync backup from a FreeBSD 7.1 PRERELEASE
 system to the drive over an NFS connection the drive disconnects and the
 server reboots.

You've not provided enough information to help track this down.  What
model/brand of disk is attached to that controller?  What does smartctl
-a have to say about the disk?  What gets printed on the console before
it reboots?  Do you have the same problem if you run
7.1-PRERELEASE/BETA2?

 Does anyone have an idea where to go from here?

The only generic advice I can give you at this point) is to avoid
Silicon Image controllers, particularly their SATA controllers.  They
have a history of causing data corruption on Linux, FreeBSD, and
Windows, and some have reported other miscellaneous problems with them
as well.  There's not enough evidence in this thread so far to blame the
SiI controller, but when I see them, I become immediately suspicious.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Drive Disconnection

2008-10-24 Thread Jeremy Chadwick
/MB559power_bracket.html
http://www.cooldrives.com/essaii3gbexp.html
http://www.newegg.com/Product/Product.aspx?Item=N82E16812119021

Finally, and I don't know if you're doing this, but -- be aware you
can't hot-swap disks via eSATA without having a hot-swap-capable
controller that fully supports hot-swapping.  Meaning: you can't yank
that d2 Quadra enclosure off the eSATA port whenever you feel like it.
You'll need to use atacontrol detach to properly detach it first, and
that's assuming the SATA controller you're using supports hot-swapping
(things with AHCI behave fairly well in this regard).

 -Original Message-
 From: Jeremy Chadwick [mailto:[EMAIL PROTECTED]
 Sent: Fri 10/24/2008 7:09 PM
 To: Jacobs, Mark - Data Center Operations [EMAIL PROTECTED]
 Cc: freebsd-questions@freebsd.org
 Subject: Re: Drive Disconnection
  
 On Fri, Oct 24, 2008 at 02:02:41PM -0400, Mark Jacobs wrote:
  I have an external Lacie 1Tb drive attached to a FreeBSD 6.4-PRERELEASE
  system via an ESATA connection.
  
  atapci0: SiI SiI 3512 SATA150 controller
  
  I cleaned off the drive by writing random data to it. The write took
  overnight and didn't experience any problems. I then added a filesystem
  to the drive and mounted it on the system.
  
  However when I perform an rsync backup from a FreeBSD 7.1 PRERELEASE
  system to the drive over an NFS connection the drive disconnects and the
  server reboots.
 
 You've not provided enough information to help track this down.  What
 model/brand of disk is attached to that controller?  What does smartctl
 -a have to say about the disk?  What gets printed on the console before
 it reboots?  Do you have the same problem if you run
 7.1-PRERELEASE/BETA2?
 
  Does anyone have an idea where to go from here?
 
 The only generic advice I can give you at this point) is to avoid
 Silicon Image controllers, particularly their SATA controllers.  They
 have a history of causing data corruption on Linux, FreeBSD, and
 Windows, and some have reported other miscellaneous problems with them
 as well.  There's not enough evidence in this thread so far to blame the
 SiI controller, but when I see them, I become immediately suspicious.
 
 -- 
 | Jeremy Chadwickjdc at parodius.com |
 | Parodius Networking   http://www.parodius.com/ |
 | UNIX Systems Administrator  Mountain View, CA, USA |
 | Making life hard for others since 1977.  PGP: 4BD6C0CB |
 
 
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Drive Disconnection

2008-10-24 Thread mark.jacobs
Thanks for all the great information. I'm going to try the USB solution for now 
since the drive was running fine for several months on this server w/USB until 
I began playing with the ESATA connection. 

If perchance USB doesn't work I will try both getting the SMART status from the 
drive and getting a better SATA controller.

Mark Jacobs
Technical Services
Time Customer Service, Tampa FL (Go Rays)

-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED]
Sent: Fri 10/24/2008 9:15 PM
To: Jacobs, Mark - Data Center Operations [EMAIL PROTECTED]
Cc: freebsd-questions@freebsd.org
Subject: Re: Drive Disconnection
 
On Fri, Oct 24, 2008 at 07:44:41PM -0400, [EMAIL PROTECTED] wrote:
 It is a Lacie d2 quadra drive but FreeBSD reports this;
 server kernel: ad4: 953869MB Hitachi HDS721010KLA330 GKAOA70M at 
 ata2-master SATA150
 
 When I perform the RSYNC I receive these errors
 
 Oct 24 12:47:13 server kernel: ad4: FAILURE - device detached
 Oct 24 12:47:13 server kernel: subdisk4: detached
 Oct 24 12:47:13 server kernel: ad4: detached
 Oct 24 12:47:13 server kernel: g_vfs_done():ad4s1a[WRITE(offset=144332767232, 
 length=131072)]error = 6
 Oct 24 12:47:13 server kernel: g_vfs_done():ad4s1a[WRITE(offset=144332898304, 
 length=131072)]error = 6
 The write failure messages keep on being issued until the server reboots. It 
 isn't in the log, but I receive a dirty buffer panic.

It appears the disk is literally falling off of the SATA bus.  The
g_vfs_done errors you see are a result of that.  I'll explain the
reboot in a moment.

There could be tons of reasons for the disk disappearing.  I'll list off
some the possibilities that come to mind:

* Drive losing power
  - Shoddiness inside of the d2 Quadra enclosure, such as bad internal
cabling or manufacturing defects,
  - AC adapter for d2 Quadra is faulty,
  - d2 Quadra could offer some kind of sleep mode where the unit goes
into a low-power-save state, and the disk ends up falling off the
bus during this time.

* SATA300 vs. SATA150 compatibility issues
  - VIA and SiS chipsets are known to experience data corruption, disks
falling off the bus, or other insanity when SATA300 disks are
connected to those chipsets.  The chipsets support SATA300, but are
downright buggy.  Workaround is to force the drive to SATA150 speed
using jumpers on the disk (only *some* manufacturers offer this),
  - The Hitachi disk in your d2 Quadra is spec'd at SATA300, while it's
obvious your Silicon Image SATA controller is only detecting
SATA150 (yet LaCie claims this enclosure does SATA300).  The 7K1000
series drives *do not* have a force-SATA150 jumper (I've checked),
which is too bad, since forcing SATA150 might fix the problem.

* d2 Quadra USB/FW/eSATA controller bug
  - I have no idea what chip is inside of that enclosure, but many of
them are bridges, e.g. they're USB/FW controllers that have a
horribly shoddy SATA emulation interface on top of them,
  - Could be a firmware bug with the controller used in the enclosure,
  - Controller may not be 100% compatible with Silicon Image devices.

* Silicon Image SATA controller bugs

As for why the system reboots: what you're experiencing is probably a
kernel panic.  On FreeBSD, when you have a filesystem that's mounted and
the underlying device (disk, etc.) is yanked out from underneath, the
kernel will panic; this is by design.  I've been told by lower-level
folks that CURRENT supposedly addresses this issue, but I haven't
personally confirmed it.

I would still like to see SMART stats on the drive.  Why?  Because SMART
stats will show me if the drive is actually losing power or not (the
Power_Cycle_Count attribute should increment).

You'll need to install ports/sysutils/smartmontools, then run smartctl
-a /dev/ad4.  Save that data somewhere, then run your rsync.  Your
machine will reboot (a soft reset, hopefully!), and once it's back up,
run the same smartctl command again, and save that data.  Then you can
compare the adjusted attributes and RAW_VALUEs; I can help you with
reading this data if need be (people often misread it).

 I don't have easy access to a 7.1 system with an ESATA port.

That's disappointing, as it would be useful to know if 7.1-PRERELEASE
behaves the same way for you.  Based on the above I'd say it probably
does, but it's always good to check.

 I'm current redoing the entire process, wipe, build filesystem, mount,
 rsync using the USB port. If that works I'm going to junk the idea of
 using the ESATA card for the drive.

I would _highly_ recommend you reconsider this.  USB on FreeBSD is in an
even worse state (and I am not exaggerating) than ATA/SATA is.  If your
disk is falling off the bus with SATA, the same will likely happen with
USB, and you'll experience the same problem.

 Can you recommend an ESATA card that fits in an PCI slot since my
 server doesn't have a PCI-E slot?

Promise makes the SATA300 TX4302 controller, which is PCI