Re: [osol-discuss] disconnecting hdd from zfs mirror hangs whole system

Harry Putnam Mon, 30 Mar 2009 16:01:02 -0700

roland <[email protected]> writes:

>>Original Poster: can you please take a look at cfgadm? Seems like
>>sil believes disk is still there or gone mad, maybe it cam be solved
>>by switching it off or marking unconfigured.
>
> i can take a look at the end of the week and will report.  anyway -
> how can this be solved by switching it of or marking unconfigured?
> i want to run a raidz with this controller with 4 disks and
> unplugging one disk for testing it for reliability. if this simple
> test doesn`t even work, the whole thing will never go into
> production, as a raid which stalls the system on unpluggig a single
> disk is completely crap and useless....


Roland,

I've had some similar disappointing experiences like that.  
Removed a disk from a mirrored pool and caught hell booting up.

What caught my attention is you using Sil3114.  I'm using Sil3112a.
Very similar but only 2 ports, and about the same vintage I think.
Similarly old.

I have my system set to boot into a console not the gui.
So it may have looked a little different from what you experienced.

But the login prompt was very sluggish.  I'd type my user name then
wait almost a full minute before getting the passwd prompt... 
Once logged in it was almost unusable... taking so long to see a
reaction when typing.

I noticed the fault monitor daemon running and sucking up 98-99
percent of resources (using `top').  It would build up quickly to
98-99 percent and hold a minute or two then drop out of sight for a
minute or two.  Seemed to cycle like that for several
minutes... 10-15.

I found that I could ssh into the box, and the terminal I got that way
seemed to be un-effected by the sluggishness or at least not nearly as
bad.  But it may have been after the cycling had stopped... not sure.

I was attempting to upgrade my 2 mirror from 200gb disks to 750gb
disks.  Something you'd expect to occur on a zfs server from time to
time. 

And like you noticed, as long as the disk was pulled I continued to
have a near unusable machine.  But put the disk back and all the
troubles would disappear.

I wasn't even trying to hot plug, like you mentioned..  I did the
pullout powered down.

Finally since I only had 11GB of data on that pool (and had copies on
a remote host too, I destroyed the entire pool `zpool destroy badpool'

Shutdown, switched out the disks from 200gb to 750gb  rebooted.

Recreated the pool and added the data from sources on a remote.  Now
seemingly smooth sailing.

However:
I am having some kind of spontaneous shutdown... but not sure if its
related until I learn more of whats causing it.  Just now starting to
try to figure out what is causing it.

Like you, I'm thinking if I'm going to have that much trouble when
pulling a disk, its looking like a large blemish on the usability of
zfs for my NAS server.

I'll admit that I'm not the sharpest kid on the block and have massive
holes in my knowledge/graso of what I'm trying to do, but it seems like
there has been just on big pain in the butt after another.

Also seems like it requires a rather extensive knowledge and skill
set to work with zfs.

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Re: [osol-discuss] disconnecting hdd from zfs mirror hangs whole system

Reply via email to