[storage-discuss] zfs over iscsi not recovering from timeouts

Tuomas Leikola Sun, 17 Apr 2011 01:57:23 -0700

Hei,

I'm crossposting this to zfs as i'm not sure which bit is to blame here.


I've been having this issue that i cannot really fix myself:

I have a OI 148 server, which hosts a log of disks on SATA
controllers. Now it's full and needs some data moving work to be done,
so i've acquired another box which runs linux and has several sata
enclosures. I'm using solaris iscsi on static-config to connect the
device.

Normally, when everything is fine, no problems. I can even restart the
iet daemon and theres just a short hiccup in the IO-stream.

Things go bad when i turn the iscsi target off for a longer period
(reboot, etc). The solaris iscsi times out, and responds these as
errors to zfs. zfs increases error counts (loses writes maybe) and
eventually marks all devices as failed and the array halts
(failmode=wait).

When in this state, there is no luck returning to running state. The
failed condition doesn't purge itself after the target becomes online
again. I've tried zpool clear but it still reports data errors and
devices as faulted. zpool export hangs.

how i see this problem is that
a) iscsi initiator reports timeouts as permanent
b) zfs handles them as such
c) there is no timeout "never" to be chosen as far as i can see

What I would like is a mode equivalent to nfs hard mount - wait
forever for the device to become available (but ability to kick the
array from cmdline if it is really dead).

Any clues?



-- 
- Tuomas
_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

[storage-discuss] zfs over iscsi not recovering from timeouts

Reply via email to