[zfs-discuss] iSCSI pool timeouts during high latency moments

2011-05-22 Thread Jim Klimov
Hi all,
 
As I wrote before, I have a dpcool implemented as an iSCSI
device stored in a volume of my physical pool. When there 
are many operations, such as attempts to destroy a dataset
(which leads to many small IOs in my config), the iSCSI 
device is 100% busy for hours, latencies can grow to seconds 
even with one task in queue max, and I believe this may lead 
to the following problem - the device is considered timed out 
and reportedly faulted:
 
===
# zpool status dcpool
 pool: dcpool
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scan: resilvered 111M in 0h5m with 0 errors on Wed May 18 04:03:02 2011
config:
NAME STATE READ WRITE CKSUM
dcpool   ONLINE   7  129K 0
  c0t600144F09844CF004D8376AE0002d0  ONLINE   0  397K 0
cache
  c4t1d0p7   ONLINE   0 0 0
errors: 132410 data errors, use '-v' for a list
===
 
As we can see above, there are no devices actually marked as 
FAULTED, except that the status says so. At times I had dmesg 
reporting the failure as well, but not always, like this:
 
===
May 22 12:51:32 bofh-sol scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
(scsi_vhci0):
May 22 12:51:32 bofh-sol
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10): Command Timeout on 
path 
iscsi0/d...@iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0
May 22 12:51:32 bofh-sol iscsi: [ID 431120 kern.warning] WARNING: iscsi 
connection(7/3f) closing connection - target requested reason:0x7
May 22 12:51:32 bofh-sol iscsi: [ID 732673 kern.info] NOTICE: iscsi session(6) 
iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 offline
May 22 12:51:39 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
May 22 12:51:39 bofh-soldrive offline
May 22 12:51:54 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
May 22 12:51:54 bofh-soldrive offline
May 22 12:52:00 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
...
May 22 13:04:35 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
May 22 13:04:35 bofh-soldrive offline
May 22 13:04:45 bofh-sol iscsi: [ID 559844 kern.info] NOTICE: iscsi session(6) 
iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 online
May 22 13:04:45 bofh-sol genunix: [ID 483743 kern.info] 
/scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10) multipath status: 
degraded: path 1 
iscsi0/d...@iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0
 is online
May 22 13:04:46 bofh-sol fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
May 22 13:04:46 bofh-sol EVENT-TIME: Sun May 22 13:04:46 MSD 2011
May 22 13:04:46 bofh-sol PLATFORM: System-Product-Name, CSN: 
System-Serial-Number, HOSTNAME: bofh-sol
May 22 13:04:46 bofh-sol SOURCE: zfs-diagnosis, REV: 1.0
May 22 13:04:46 bofh-sol EVENT-ID: 435a2ccd-e710-cd37-a18e-a46743002e56
May 22 13:04:46 bofh-sol DESC: The ZFS pool has experienced currently 
unrecoverable I/O
May 22 13:04:46 bofh-solfailures.  Refer to 
http://sun.com/msg/ZFS-8000-HC for more information.
May 22 13:04:46 bofh-sol AUTO-RESPONSE: No automated response will be taken.
May 22 13:04:46 bofh-sol IMPACT: Read and write I/Os cannot be serviced.
May 22 13:04:46 bofh-sol REC-ACTION: Make sure the affected devices are 
connected, then run
May 22 13:04:46 bofh-sol'zpool clear'.
===
 
The zillions of errors are in ZFS metadata as seen in zpool status -v,
and this condition can be cleared by forced reboot (uadmin 1 8) and 
subsequent import of the dcpool, when after a few hours it comes 
online with deferred-free blocks deleted.
 
Neither zpool export nor zpool clear help in this situation,
and may instead lead to a hang of the system (no IOs would
succeed, even to other pools).
 
I've tried zpool attributes failmode=wait and =continue, with
same results.
 
I am not sure what to make of the two reports (in dmesg and zpool
status) - possibly when the system bottleneck clears, the iSCSI
device becomes visible again - but IOs are no longer served.
I did not manage to restart iSCSI target and/or initiator SMF services,
because in my case importing/mounting of dcpool is wrapped in a 
service which depends on the iSCSI services, and umounting/exporting
hangs - but I'd keep trying somehow (suggestions welcome, so far
I think of breaking the dependency and/or disabling my dcpool-mount
service).
 
While this particular system is at home and I can afford to reboot it, 
I do wonder what should be properly 

Re: [zfs-discuss] iSCSI pool timeouts during high latency moments

2011-05-22 Thread Richard Elling
comment below...

On May 22, 2011, at 8:31 AM, Jim Klimov wrote:

 Hi all,
  
 As I wrote before, I have a dpcool implemented as an iSCSI
 device stored in a volume of my physical pool. When there
 are many operations, such as attempts to destroy a dataset
 (which leads to many small IOs in my config), the iSCSI
 device is 100% busy for hours, latencies can grow to seconds
 even with one task in queue max, and I believe this may lead
 to the following problem - the device is considered timed out
 and reportedly faulted:
  
 ===
 # zpool status dcpool
  pool: dcpool
  state: ONLINE
 status: One or more devices are faulted in response to IO failures.
 action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
  scan: resilvered 111M in 0h5m with 0 errors on Wed May 18 04:03:02 2011
 config:
 NAME STATE READ WRITE CKSUM
 dcpool   ONLINE   7  129K 0
   c0t600144F09844CF004D8376AE0002d0  ONLINE   0  397K 0
 cache
   c4t1d0p7   ONLINE   0 0 0
 errors: 132410 data errors, use '-v' for a list
 ===
  
 As we can see above, there are no devices actually marked as
 FAULTED, except that the status says so. At times I had dmesg
 reporting the failure as well, but not always, like this:
  
 ===
 May 22 12:51:32 bofh-sol scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci 
 (scsi_vhci0):
 May 22 12:51:32 bofh-sol
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10): Command Timeout on 
 path 
 iscsi0/d...@iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0
 May 22 12:51:32 bofh-sol iscsi: [ID 431120 kern.warning] WARNING: iscsi 
 connection(7/3f) closing connection - target requested reason:0x7

This means that the target closed the connection because there was already a 
task in progress. 
Likely this was the retry after the timeout. By default, these timeouts are 
quite long, so by now
performance is already terrible.

I'm not sure if you can make the iscsi target any slower.  I've seen some 
well-known-manufacturer's
iscsi arrays take 2-3 minutes to respond, in a repeatable manner. The best 
advice is to fix it or
replace it. I don't think there is anything you can do on the initiator side to 
get to a state of 
happiness.
 -- richard

 May 22 12:51:32 bofh-sol iscsi: [ID 732673 kern.info] NOTICE: iscsi 
 session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 offline
 May 22 12:51:39 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
 May 22 12:51:39 bofh-soldrive offline
 May 22 12:51:54 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
 May 22 12:51:54 bofh-soldrive offline
 May 22 12:52:00 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
 ...
 May 22 13:04:35 bofh-sol scsi: [ID 107833 kern.warning] WARNING: 
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10):
 May 22 13:04:35 bofh-soldrive offline
 May 22 13:04:45 bofh-sol iscsi: [ID 559844 kern.info] NOTICE: iscsi 
 session(6) iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a4 online
 May 22 13:04:45 bofh-sol genunix: [ID 483743 kern.info] 
 /scsi_vhci/disk@g600144f09844cf004d8376ae0002 (sd10) multipath status: 
 degraded: path 1 
 iscsi0/d...@iqn.1986-03.com.sun:02:380c2eb8-4a4a-6e78-e304-c8cd5e29a7a40001,0
  is online
 May 22 13:04:46 bofh-sol fmd: [ID 377184 daemon.error] SUNW-MSG-ID: 
 ZFS-8000-HC, TYPE: Error, VER: 1, SEVERITY: Major
 May 22 13:04:46 bofh-sol EVENT-TIME: Sun May 22 13:04:46 MSD 2011
 May 22 13:04:46 bofh-sol PLATFORM: System-Product-Name, CSN: 
 System-Serial-Number, HOSTNAME: bofh-sol
 May 22 13:04:46 bofh-sol SOURCE: zfs-diagnosis, REV: 1.0
 May 22 13:04:46 bofh-sol EVENT-ID: 435a2ccd-e710-cd37-a18e-a46743002e56
 May 22 13:04:46 bofh-sol DESC: The ZFS pool has experienced currently 
 unrecoverable I/O
 May 22 13:04:46 bofh-solfailures.  Refer to 
 http://sun.com/msg/ZFS-8000-HC for more information.
 May 22 13:04:46 bofh-sol AUTO-RESPONSE: No automated response will be taken.
 May 22 13:04:46 bofh-sol IMPACT: Read and write I/Os cannot be serviced.
 May 22 13:04:46 bofh-sol REC-ACTION: Make sure the affected devices are 
 connected, then run
 May 22 13:04:46 bofh-sol'zpool clear'.
 ===
  
 The zillions of errors are in ZFS metadata as seen in zpool status -v,
 and this condition can be cleared by forced reboot (uadmin 1 8) and
 subsequent import of the dcpool, when after a few hours it comes
 online with deferred-free blocks deleted.
  
 Neither zpool export nor zpool clear help in this situation,
 and may instead lead to a hang of the system (no IOs would
 succeed, even to other pools).
  
 I've tried zpool attributes 

Re: [zfs-discuss] iSCSI pool timeouts during high latency moments

2011-05-22 Thread Jim Klimov

2011-05-22 20:39, Richard Elling wrote:



 This means that the target closed the connection because there was
 already a task in progress.
 Likely this was the retry after the timeout. By default, these
 timeouts are quite long, so by now
 performance is already terrible.

 I'm not sure if you can make the iscsi target any slower. I've seen
 some well-known-manufacturer's
 iscsi arrays take 2-3 minutes to respond, in a repeatable manner. The
 best advice is to fix it or
 replace it. I don't think there is anything you can do on the
 initiator side to get to a state of
 happiness.
 -- richard



Well, thanks for the advice. As my readers may know ;) I am now
in the process of evacuating data from this lagging dcpool into
the physical pool which contains the volume and serves iSCSI.

I am surprised that playing with failmode achieved nothing -
I hoped that the dcpool would continue working where it hang
at the moment of loss of iSCSI device as soon as it is found
again... Or do the errors imply that it was not found after all?

So since I can't seem to fix the ZFS/iSCSI server performance,
I'll continue along the path of replacing it with native filesystem
datasets in the parent pool.

The experiment taught me a lot, but the practical end result is
not enjoyable for me however - with such lags and overheads
for ZFS volume storage and most of all unreliable performance.
Next time I'll think twice if iSCSI (at all and via ZFS in particular)
is an arguably better solution than large files over NFS (for VMs
or whatever), and/or will plan better and try to stress-test first.
In particular this should and would mean nearly filling up the
pool with several TBs of (any) data... and such stress-testing
is about what I did now before spec'ing a server for work and
defining what it should and should not do in order to perform :)

Maybe it is just my computer (Dual-core 2.8GHz P4 with 8Gb RAM) -
but if such specs repeatably fail at running a single volume, then I'm
at a loss to predict and construct machines for good stable
ZFS/iSCSI performance machines ;\

Again, part of the lags is probably contributed to by dedup inside
this dcpool, and by using 4Kb data blocks along with 4Kb metadata
blocks for its container volume pool/dcpool - which probably lead
to high processing overheads (and certainly did lead to 2x storage
consumption), and maybe to high fragmentation.

--


++
||
| Климов Евгений, Jim Klimov |
| технический директор   CTO |
| ЗАО ЦОС и ВТ  JSC COSHT |
||
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru |
|  CC:ad...@cos.ru,jimkli...@mail.ru |
++
| ()  ascii ribbon campaign - against html mail  |
| /\- against microsoft attachments  |
++




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss