I've been reading this thread with interest, while simultaneously dealing with an outage on our Oracle (née Sun) 7410C cluster. This is the ZFS NAS appliance offering. Ours is fairly small, about 40TB total, serving up NFS-only to Solaris, RHEL and VMWare hosts.

After a shaky start the cluster has been pretty stable for the last year or so, but the last two weeks have been a trip back to bizzarro world. About ten days ago 'node one' just dropped off the network for eight minutes. No cluster failover, both nodes stayed up. Then it came back online with no intervention. Then ten minutes later a disk failed. Coincidence? Meh, opened a case. Took 8 days to get a 1TB SATA drive delivered, lolz.

In the meantime the mgmt interfaces for both nodes cease to function. It continues to serve data though. Seems there's this bug where log files get too big and cause the management processes to crash. Support dude logs in remotely, cleans up the egregious log files and restarts the daemons. OK now? Well, 'node one' is OK. 'Node two' drops off the network for 8 minutes.

Granted I am running a one year old version of the system code. Support hasn't been able to say if newer code would have prevented any of this.

They are proposing that the first outage was probably due to the drive failure because data would be unavailable while the pool was resilvering. Huh? This is "RAID", right? I'm confused. The 2nd outage was allegedly due to restarting the daemon. Apparently restarting the management daemon ifconfigs the interfaces down. So why would you do that at 3PM on a Wednesday then? And hmm, why did the same procedure only affect 'node two'? Still waiting for answers.

I'm just venting at this point I guess; take it for what it's worth. I have a knack for selecting unreliable storage (anyone remember MTI?), so just watch what I do and do the opposite.

FWIW, we also have a NetApp FAS-3020 and it's been solid, but I don't ask much of it.

Cheers,

--

Roy McMorran
Bar Harbor, ME

_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to