Yikes, I'm back at it again, and sooooo frustrated. For about 2-3 weeks now, I had the iscsi mirror configuration in production, as previously described. Two disks on system 1 mirror against two disks on system 2, everything done via iscsi, so you could zpool export on machine 1, and then zpool import on machine 2 for a manual failover.
Created the dependency - initiator depends on target, and created a new smf service to mount the iscsi zpool after the initiator is up (and consequently export the zpool before the initiator shuts down.) Able to reboot, everything working perfectly. Until today. Today I rebooted one system for some maintenance, and it stayed down longer than expected, so those disks started throwing errors on the second machine. First system eventually came up again, second system resilvered, everything looked good. I zpool clear'd the pool on the second machine just to make the counters look pretty again. But it wasn't pretty at all. This is so bizarre - Throughout the day, the VM's on system 2 kept choking. I had to powercycle system 2 about half a dozen times due to unresponsiveness. Exactly the type of behavior you expect for IO error - but nothing whatsoever appears in the system log, and the zpool status still looks clean. Several times, I destroyed the pool and recreated it completely from backup. zfs send and zfs receive both work fine. But strangely - when I launch a VM, the IO grinds to a halt, and I'm forced to powercycle (usually) the host. You might try to conclude it's something wrong with virtualbox - but it's not. I literally copied & pasted the zfs send | zfs receive commands that restored the pool from backup, but this time restored it onto local storage. The only difference is local disk versus iscsi pool. And then it finally worked without any glitches. During the day, trying to get the iscsi pool up again - this is so bizarre - I did everything I could think of, to get back to a pristine state. I removed iscsi targets, I removed lun's (lu's), I removed the static discovery and re-added it, got new device names, I wiped the disks (zpool destroy & zpool create) re-created lu's, re-created static discovery, re-created targets, re-created zpools... The behavior was the same no matter what I did. I can create the pool, import it, zfs receive onto it no problem, but then when I launch the VM, the whole system grinds to a halt. VirtualBox will be in a "sleep" state, Virtualbox shows the green light on the hard drive indicating it's trying to read, meanwhile if I try to X it out, it won't die, and gnome gives me the "Force Quit" dialog, meanwhile I can sudo kill -KILL VirtualBox, and VirtualBox *still* won't die. Any "zpool" or "zfs" command I type in hangs indefinitely (even time-slider daemon or zfs auto snapshot are hung). I can poke around the system in other areas - on other pools and stuff - but the only way out of it is power cycle. It's so weird, that once the problem happens once, I have not yet found any way to recover from it except to reformat and reinstall the OS for the whole system. I cannot, for the life of me, think of *any*thing that could be storing state like this, preventing me from getting back into a usable iscsi mirror pool. One thing I haven't tried yet - It appears, I think, that when you make a disk, let's say c2t4d0 an iscsi target, let's say c6t7blahblahblahd0... It appears, I think, that c6t7blahblahblahd0 is actually c2t4d0s2. I could create a pool using c2t4d0, and/or zero the whole disk, completely obliterating any semblance of partition tables inside there, or old redundant copies of old uberblocks or anything like that. But seriously, I'm grasping at straws here, just trying to find *any* place where some bad state is stored that I haven't thought of yet. I shouldn't need to reformat the host. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss