> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> I look after a remote server that has two iSCSI pools. The volumes for
> each pool are sparse volumes and a while back the target's storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots. Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy. This kills performance for snapshots with many random file
> removals and additions.
> I'm currently zero filling the bad pool to recover space on the target
> storage to see if that improves matters.
> Has anyone else seen similar behaviour with previously degraded iSCSI
This sounds exactly like the behavior I was seeing with my attempt at two
machines zpool mirror'ing each other via iscsi. In my case, I had two machines
that are both targets and initiators. I made the initiator service dependent
on the target service, and I made the zpool mount dependent on the initiator
service, and I made the virtualbox guest start dependent on the zpool mount.
Everything seemed fine for a while, including some reboots. But then one
reboot, one of my systems stayed down too long, and when it finally came back
up, both machines started choking.
So far I haven't found any root cause, and so far the only solution I've found
was to reinstall the OS. I tried everything I know in terms of removing,
forgetting, recreating the targets, initiators, and pool, but somehow none of
that was sufficient.
I recently (yesterday) got budgetary approval to dig into this more, so
hopefully maybe I'll have some insight before too long, but don't hold your
breath. I could fail, and even if I don't, it's likely to be weeks or months.
What I want to know from you is:
Which machines are your solaris machines? Just the targets? Just the
initiators? All of them?
You say you're having problems just with snapshots. Are you sure you're not
having trouble with all sorts of IO, and not just snapshots? What about import
In my case, I found I was able to zfs send, zfs receive, zfs status, all fine.
But when I launched a guest VM, there would be a massive delay - you said up to
9 seconds - I was sometimes seeing over 30s - sometimes crashing the host
system. And the guest OS was acting like it was getting IO error, without
actually displaying error message indicating IO error. I would attempt, and
sometimes fail, to power off the guest vm (kill -KILL VirtualBox). After the
failure began, zpool status still works (and reports no errors), but if I try
to do things like export/import, they fail indefinitely, and I need to power
cycle the host. While in the failure mode, I can zpool iostat, and I sometimes
see 0 transactions with nonzero bandwidth. Which defies my understanding.
Did you ever see the iscsi targets "offline" or "degraded" in any way? Did you
do anything like "online" or "clear?"
My systems are openindiana - the latest, I forget if that's 151a5 or a6
zfs-discuss mailing list