Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-11 Thread Ross
... which sounds very similar to issues I've raised many times. ZFS should have the ability to double check what a drive is doing, and speculatively time out a device that appears to be failing in order to maintain pool performance. If a single drive in a redundant pool can be seen to be respon

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-11 Thread Sanjeev
Hi Chris, On Sun, Aug 09, 2009 at 05:53:12PM -0700, Chris Baker wrote: > OK - had a chance to do more testing over the weekend. Firstly some extra > data: > > Moving the mirror to both drives on ICH10R ports and on sudden disk power-off > the mirror faulted cleanly to the remaining drive no pro

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-09 Thread Sanjeev
Chris, Thanks for providing the details and the dump. I shall look into this and update with my findings. Thanks and regards, Sanjeev On Sun, Aug 09, 2009 at 05:53:12PM -0700, Chris Baker wrote: > Hi Sanjeev > > OK - had a chance to do more testing over the weekend. Firstly some extra > data:

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-09 Thread Chris Baker
Hi Sanjeev OK - had a chance to do more testing over the weekend. Firstly some extra data: Moving the mirror to both drives on ICH10R ports and on sudden disk power-off the mirror faulted cleanly to the remaining drive no problem. Having a one drive pool on the ICH10R under heavy write traffic

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread Sanjeev
Chris, On Wed, Aug 05, 2009 at 05:33:24AM -0700, Chris Baker wrote: > Sanjeev > > Thanks for taking an interest. Unfortunately I did have failmode=continue, > but I have just destroyed/recreated and double confirmed and got exactly the > same results. > > zpool status shows both drives mirror,

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread roland
doesn´t solaris have the great builtin dtrace for issues like these ? if we knew in which syscall or kernel-thread the system is stuck, we may get a clue... unfortunately, i don´t have any real knowledge of solaris kernel internals or dtrace... -- This message posted from opensolaris.org _

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread Ross
Yeah, sounds just like the issues I've seen before. I don't think you're likely to see a fix anytime soon, but the good news is that so far I've not seen anybody reporting problems with LSI 1068 based cards (and I've been watching for a while). With the 1068 being used in the x4540 Thumper 2,

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread Chris Baker
I've left it hanging about 2 hours. I've also just learned that whatever the issue is it is also blocking an "init 5" shutdown. I was thinking about setting a watchdog with a forced reboot but that will get me nowhere if I need I reset button restart. Thanks for the advice re the LSI 1068, not

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread Ross
Just a thought, but how long have you left it? I had problems with a failing drive a while back which did eventually get taken offline, but took about 20 minutes to do so. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-di

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-05 Thread Chris Baker
Sanjeev Thanks for taking an interest. Unfortunately I did have failmode=continue, but I have just destroyed/recreated and double confirmed and got exactly the same results. zpool status shows both drives mirror, ONLINE, no errors dmesg shows: SATA device detached at port 0 cfgadm shows: sa

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Sanjeev
Chris, Can you please check the failmode property of the pool ? -- zpool get failmode If it is set to "wait", you could try setting it to "continue". Regards, Sanjeev On Tue, Aug 04, 2009 at 08:56:03PM -0700, Chris Baker wrote: > Ok - in an attempt to weasel my way past the issue I mirrored my

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Ross
Whether ZFS properly detects device removal depends to a large extent on the device drivers for the controller. I personally have stuck to using controllers with chipsets I know Sun use on their own servers, but even then I've been bitten by similar problems to yours on the AOC-SAT2-MV8 cards.

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Chris Baker
Ok - in an attempt to weasel my way past the issue I mirrored my problematic si3124 drive to a second drive on the ICH10R, started writing to the file system and then killed the power to the si3124 removable drive. To my (unfortunate) surprise, the IO stream that was writing to the mirrored fil

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Chris Baker
It's a generic Sil3132 based PCIe x1 card using the si3124 driver. Prior to this I had been using Intel ICH10R with AHCI but I have found the Sil3132 actually hot plugs a little smoother than the Intel chipset. I have not gone back to recheck this specific problem on the ICH10R (though I can), I

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread roland
what exact type of sata controller do you use? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Chris Baker
Apologies - I'm daft for not saying originally: OpenSolaris 2009.06 on x86 Cheers Chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Ross
What version of Solaris / OpenSolaris are you running there? I remember zfs commands locking up being a big problem a while ago, but I thought they'd managed to solve the issues like this. -- This message posted from opensolaris.org ___ zfs-discuss ma

[zfs-discuss] Recovering from ZFS command lock up after yanking a non-redundant drive?

2009-08-04 Thread Chris Baker
Hi I'm running an application which is using hot plug sata drives as giant removable usb keys but bigger and with SATA performance. I'm using “cfgadm connect” then “configure” then “zpool import” to bring a drive on-line and export / unconfigure / disconnect before unplugging. All works well.