Re: [zfs-discuss] Woeful performance from an iSCSI pool

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) Thu, 22 Nov 2012 05:10:47 -0800

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target's storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
> 
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
> 
> I'm currently zero filling the bad pool to recover space on the target
> storage to see if that improves matters.
> 
> Has anyone else seen similar behaviour with previously degraded iSCSI
> pools?


This sounds exactly like the behavior I was seeing with my attempt at two 
machines zpool mirror'ing each other via iscsi.  In my case, I had two machines 
that are both targets and initiators.  I made the initiator service dependent 
on the target service, and I made the zpool mount dependent on the initiator 
service, and I made the virtualbox guest start dependent on the zpool mount.

Everything seemed fine for a while, including some reboots.  But then one 
reboot, one of my systems stayed down too long, and when it finally came back 
up, both machines started choking.

So far I haven't found any root cause, and so far the only solution I've found 
was to reinstall the OS.  I tried everything I know in terms of removing, 
forgetting, recreating the targets, initiators, and pool, but somehow none of 
that was sufficient.

I recently (yesterday) got budgetary approval to dig into this more, so 
hopefully maybe I'll have some insight before too long, but don't hold your 
breath.  I could fail, and even if I don't, it's likely to be weeks or months.

What I want to know from you is:

Which machines are your solaris machines?  Just the targets?  Just the 
initiators?  All of them?

You say you're having problems just with snapshots.  Are you sure you're not 
having trouble with all sorts of IO, and not just snapshots?  What about import 
/ export?

In my case, I found I was able to zfs send, zfs receive, zfs status, all fine.  
But when I launched a guest VM, there would be a massive delay - you said up to 
9 seconds - I was sometimes seeing over 30s - sometimes crashing the host 
system.  And the guest OS was acting like it was getting IO error, without 
actually displaying error message indicating IO error.  I would attempt, and 
sometimes fail, to power off the guest vm (kill -KILL VirtualBox).  After the 
failure began, zpool status still works (and reports no errors), but if I try 
to do things like export/import, they fail indefinitely, and I need to power 
cycle the host.  While in the failure mode, I can zpool iostat, and I sometimes 
see 0 transactions with nonzero bandwidth.  Which defies my understanding.

Did you ever see the iscsi targets "offline" or "degraded" in any way?  Did you 
do anything like "online" or "clear?"

My systems are openindiana - the latest, I forget if that's 151a5 or a6

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Woeful performance from an iSCSI pool

Reply via email to