Yikes, I'm back at it again, and sooooo frustrated.
For about 2-3 weeks now, I had the iscsi mirror configuration in production, as
previously described. Two disks on system 1 mirror against two disks on system
2, everything done via iscsi, so you could zpool export on machine 1, and then
zpool import on machine 2 for a manual failover.
Created the dependency - initiator depends on target, and created a new smf
service to mount the iscsi zpool after the initiator is up (and consequently
export the zpool before the initiator shuts down.) Able to reboot, everything
Today I rebooted one system for some maintenance, and it stayed down longer
than expected, so those disks started throwing errors on the second machine.
First system eventually came up again, second system resilvered, everything
looked good. I zpool clear'd the pool on the second machine just to make the
counters look pretty again.
But it wasn't pretty at all.
This is so bizarre -
Throughout the day, the VM's on system 2 kept choking. I had to powercycle
system 2 about half a dozen times due to unresponsiveness. Exactly the type of
behavior you expect for IO error - but nothing whatsoever appears in the system
log, and the zpool status still looks clean.
Several times, I destroyed the pool and recreated it completely from backup.
zfs send and zfs receive both work fine. But strangely - when I launch a VM,
the IO grinds to a halt, and I'm forced to powercycle (usually) the host.
You might try to conclude it's something wrong with virtualbox - but it's not.
I literally copied & pasted the zfs send | zfs receive commands that restored
the pool from backup, but this time restored it onto local storage. The only
difference is local disk versus iscsi pool. And then it finally worked without
During the day, trying to get the iscsi pool up again - this is so bizarre - I
did everything I could think of, to get back to a pristine state. I removed
iscsi targets, I removed lun's (lu's), I removed the static discovery and
re-added it, got new device names, I wiped the disks (zpool destroy & zpool
create) re-created lu's, re-created static discovery, re-created targets,
re-created zpools... The behavior was the same no matter what I did.
I can create the pool, import it, zfs receive onto it no problem, but then when
I launch the VM, the whole system grinds to a halt. VirtualBox will be in a
"sleep" state, Virtualbox shows the green light on the hard drive indicating
it's trying to read, meanwhile if I try to X it out, it won't die, and gnome
gives me the "Force Quit" dialog, meanwhile I can sudo kill -KILL VirtualBox,
and VirtualBox *still* won't die. Any "zpool" or "zfs" command I type in hangs
indefinitely (even time-slider daemon or zfs auto snapshot are hung). I can
poke around the system in other areas - on other pools and stuff - but the only
way out of it is power cycle.
It's so weird, that once the problem happens once, I have not yet found any way
to recover from it except to reformat and reinstall the OS for the whole
system. I cannot, for the life of me, think of *any*thing that could be
storing state like this, preventing me from getting back into a usable iscsi
One thing I haven't tried yet -
It appears, I think, that when you make a disk, let's say c2t4d0 an iscsi
target, let's say c6t7blahblahblahd0... It appears, I think, that
c6t7blahblahblahd0 is actually c2t4d0s2. I could create a pool using c2t4d0,
and/or zero the whole disk, completely obliterating any semblance of partition
tables inside there, or old redundant copies of old uberblocks or anything like
that. But seriously, I'm grasping at straws here, just trying to find *any*
place where some bad state is stored that I haven't thought of yet.
I shouldn't need to reformat the host.
zfs-discuss mailing list