Yikes, I'm back at it again, and sooooo frustrated.

For about 2-3 weeks now, I had the iscsi mirror configuration in production, as 
previously described.  Two disks on system 1 mirror against two disks on system 
2, everything done via iscsi, so you could zpool export on machine 1, and then 
zpool import on machine 2 for a manual failover.

Created the dependency - initiator depends on target, and created a new smf 
service to mount the iscsi zpool after the initiator is up (and consequently 
export the zpool before the initiator shuts down.)  Able to reboot, everything 
working perfectly.

Until today.

Today I rebooted one system for some maintenance, and it stayed down longer 
than expected, so those disks started throwing errors on the second machine.  
First system eventually came up again, second system resilvered, everything 
looked good.  I zpool clear'd the pool on the second machine just to make the 
counters look pretty again.

But it wasn't pretty at all.

This is so bizarre - 

Throughout the day, the VM's on system 2 kept choking.  I had to powercycle 
system 2 about half a dozen times due to unresponsiveness.  Exactly the type of 
behavior you expect for IO error - but nothing whatsoever appears in the system 
log, and the zpool status still looks clean.

Several times, I destroyed the pool and recreated it completely from backup.  
zfs send and zfs receive both work fine.  But strangely - when I launch a VM, 
the IO grinds to a halt, and I'm forced to powercycle (usually) the host.

You might try to conclude it's something wrong with virtualbox - but it's not.  
I literally copied & pasted the zfs send | zfs receive commands that restored 
the pool from backup, but this time restored it onto local storage.  The only 
difference is local disk versus iscsi pool.  And then it finally worked without 
any glitches.

During the day, trying to get the iscsi pool up again - this is so bizarre - I 
did everything I could think of, to get back to a pristine state.  I removed 
iscsi targets, I removed lun's (lu's), I removed the static discovery and 
re-added it, got new device names, I wiped the disks (zpool destroy & zpool 
create)  re-created lu's, re-created static discovery, re-created targets, 
re-created zpools...  The behavior was the same no matter what I did.

I can create the pool, import it, zfs receive onto it no problem, but then when 
I launch the VM, the whole system grinds to a halt.  VirtualBox will be in a 
"sleep" state, Virtualbox shows the green light on the hard drive indicating 
it's trying to read, meanwhile if I try to X it out, it won't die, and gnome 
gives me the "Force Quit" dialog, meanwhile I can sudo kill -KILL VirtualBox, 
and VirtualBox *still* won't die.  Any "zpool" or "zfs" command I type in hangs 
indefinitely (even time-slider daemon or zfs auto snapshot are hung).  I can 
poke around the system in other areas - on other pools and stuff - but the only 
way out of it is power cycle.

It's so weird, that once the problem happens once, I have not yet found any way 
to recover from it except to reformat and reinstall the OS for the whole 
system.  I cannot, for the life of me, think of *any*thing that could be 
storing state like this, preventing me from getting back into a usable iscsi 
mirror pool.

One thing I haven't tried yet - 

It appears, I think, that when you make a disk, let's say c2t4d0 an iscsi 
target, let's say c6t7blahblahblahd0...  It appears, I think, that 
c6t7blahblahblahd0 is actually c2t4d0s2.  I could create a pool using c2t4d0, 
and/or zero the whole disk, completely obliterating any semblance of partition 
tables inside there, or old redundant copies of old uberblocks or anything like 
that.  But seriously, I'm grasping at straws here, just trying to find *any* 
place where some bad state is stored that I haven't thought of yet.

I shouldn't need to reformat the host.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
              • ... Jim Klimov
              • ... Dan Swartzendruber
              • ... Jim Klimov
              • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Jim Klimov
              • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Jim Klimov
              • ... Jim Klimov
              • ... Jim Klimov
              • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Timothy Coalson
              • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Timothy Coalson
  • Re: [zfs-discuss]... matthew patton

Reply via email to