[osol-help] The epic of the OpenSolaris b134 ZFS meltdown. ...a cautionary tale.

James Sun, 25 Jul 2010 20:19:21 -0700

Well it appears I have managed to nuke my six 1.5TB RAIDz2 zpool.  Yes, this 
isn't pleasant, especially considering most of the data isn't backed up.  
...yes I know, back ups are always needed.


So here is the story on how I managed to bring down the fabled ZFS.  

Well about a month ago my zpool started becoming very sluggish, like 10+ 
minutes to read/write a 50K file.  I looked through /var/adm/messages and did a 
iostat -xn 1 and managed to deduce that c6t3d0 had died, even though zpool 
status -x said everything was kosher.  I replaced this drive and everything 
started working as expected.

While I was resilvering c6t3d0 it started out saying it was going to take 68 
hours.  This seemed to be a fair estimate as it took 17 hours to copy the first 
200 GB.  However after that the remaining 600 GB flew by in 6 hours.  I 
wondered what was going on.  Well I was playing around in different parts of 
the FS and discovered some files would copy off of the NAS relatively quick, 
and others took a long time and would dip an peak and wouldn't maintain steady 
throughput.   I discovered the files that were unsteady were the torrent files 
I had downloaded.  So this looked like a fragmentation issue.  (Especially as I 
wasn't preallocating the files.)

I decided I wanted to defrag the pool and discovered that ZFS does not support 
defragging, but unfortunately that didn't stop me from trying anyway. (Why the 
heck hasn't defag been more of a priority for those code wizards up at Sun?)  
It occurred to me that I could copy the torrent downloads to a temp drive, and 
then copy them back to the main pool, this way all of the files should be 
contiguous.  I was then going to delete the original files.  There was just one 
problem with this idea, the main reasons I wanted to run b134 was because of 
dedup.  So I turned dedup off and proceeded with my ill fated plan.  

Well that proved to be a bit of an issue.  I tried deleting the old files from 
Gnome.  When I did this the process would hang, and you couldn't actually end 
the process even with a kill -9 from root.  Eventually the box would freeze and 
I would have to reboot.  I then tried this will a rm -R but with the same 
results.  After I would reboot any files that had been deleted would have 
returned.  At this point you'd think I would take a hint and back up my files, 
and set up a new ZFS NAS, but I'm not one for subtlety.  

I was coping things to my Mac over SMB and it occurred to me that I hadn't 
tried deleting these pesky files over SMB yet.  I'm not sure why I thought that 
was such a break through, I mean after all rm hung the system, what better luck 
would SMB have?  Aside from the fact that CIFS is integrated in to ZFS and 
where as the RM command is external to ZFS.

Well this turned out to be the silver bullet to take down the mighty ZFS 
volume.  Everything seemed to work out of ok.  I managed to delete half of the 
files, the other half said I didn't have permissions, even though I knew I did. 
 But I thought nothing of it.  When I woke up the next day and decided to watch 
some videos I discovered my SMB volumes wouldn't mount, and I couldn't ssh into 
the box.  So thinking nothing of it, as this was the same result as my previous 
attempts to remove the files, I power cycled the box.  However, this time the 
box started responding to ping but never accepted ssh and never allowed an SMB 
connection.  

So I hooked up a monitor and I discovered that the server was hanging while 
probing the ZFS volumes!

Ok, so I booted from the b134 live disc and attempted to do an zpool import -f 
megapool.  (Yes, I called my pool megapool, it's not so mega anymore.)  Well 
this process would never complete, even though all of the FSs would mount and I 
could see megapool in the zpool list and I could do a zpool status megapool and 
everything looked good.  (I'm not liking this status command very much.)  The 
problem, aside from the fact the import would not complete, was that no files 
appeared in any of the FSs. So I tried zpool import -fF megapool.  Same result. 
 I tried to send on of the FS from megapool with zfs send megapool/users | zfs 
receive tpool/users_backup.  Well the io looked good, and the tpool started to 
show consumed bytes, but then it hung...  I'll spare you all the minutia of the 
every iterative attempt I have made to recover this situation, but in short 
megapool is now certifiably FUBAR.

Here are the lessons to be learned:

1) I don't think b134 is ready for prime time.
2) More specifically I don't think dedup is ready for prime time.
3) But most importantly, when a system starts acting outside of expected 
parameters it is time to back up, period! If you really want to push at every 
weak corner, go for it, just not when the house of cards is holding on to the 
only copy of your data!
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-help mailing list
opensolaris-help@opensolaris.org

[osol-help] The epic of the OpenSolaris b134 ZFS meltdown. ...a cautionary tale.

Reply via email to