Re: [zfs-discuss] ZFS hangs/freezes after disk failure,

Jens Elkner Mon, 25 Aug 2008 11:26:30 -0700

On Mon, Aug 25, 2008 at 08:17:55PM +1200, Ian Collins wrote:
> John Sonnenschein wrote:
> >
> > Look, yanking the drives like that can seriously damage the drives
> > or your motherboard. Solaris doesn't let you do it ...


Haven't seen an andruid/"universal soldier" shipping with Solaris ... ;-)

> > and assumes that something's gone seriously wrong if you try it. That Linux 
> > ignores the behavior and lets you do it sounds more like a bug in linux 
> > than anything else.

Not sure, whether everything, what can't be understood, is "likely a bug"
- maybe it is "more forgiving" and tries its best to solve the problem
without taking you out of business (see below), even if it requires some
hacks not in line with specifications ...

> One point that's been overlooked in all the chest thumping - PCs vibrate
> and cables fall out.  I had this happen with an SCSI connector.  Luckily

Yes - and a colleague told me, that he've had the same problem once.
Also he managed a SiemensFujitsu server, where the SCSI-controller card 
had a tiny hairline crack: very odd behavior, usually not reproducible,
IIRC, the 4th ServiceEngineer finally replaced the card ...

> So pulling a drive is a possible, if rare, failure mode.

Definitely!

And expecting strange controller (or in general hardware) behavior is
possibly a big + for an OS, which targets SMEs and "home users" as well
(everybody knows about far east and other cheap HW producers,  which 
sometimes seem to say, lets ship it, later we build a special driver for
MS Windows, which workarounds the bug/problem ...).
 
"Similar" story: ~ 2000+ we had a WG server with 4 IDE channels PATA,
one HDD on each. HDD0 on CH0 mirrored to HDD2 on CH2, HDD1 on CH1 mirrored
to HDD3 on CH3, using Linux Softraid driver. We found out, that when
HDD1 on CH1 got on the blink, for some reason the controller got on the
blink as well, i.e. took CH0 and vice versa down too. After reboot, we
were able to force the md raid to re-take the bad marked drives and even
found out, that the problem starts, when a certain part of a partition
was accessed (which made the ops on that raid really slow for some
minutes - but after the driver marked the drive(s) as bad, performance
was back). Thus disabling the partition gave us the time to get a new
drive... During all these ops nobody (except sysadmins) realized, that we
had a problem - thanx to the md raid1 (with xfs btw.). And also we did not
have any data corruption (at least, nobody has complained about it ;-)).

Wrt. what I've experienced and read in ZFS-discussion etc. list I've the
__feeling__, that we would have got really into trouble, using Solaris
(even the most recent one) on that system ... 
So if one asks me, whether to run Solaris+ZFS on a production system, I
usually say: definitely, but only, if it is a Sun server ...

My 2¢ ;-)

Regards,
jel.

PS: And yes, all the vendor specific workarounds/hacks are for Linux
    kernel folks a problem as well - at least on Torvalds side
    discouraged IIRC ...
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS hangs/freezes after disk failure,

Reply via email to