Re: [zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-25 Thread Scara Maccai
> Oh, and regarding the original post -- as several
> readers correctly
> surmised, we weren't faking anything, we just didn't
> want to wait
> for all the device timeouts.  Because the disks were
> on USB, which
> is a hotplug-capable bus, unplugging the dead disk
> generated an
> interrupt that bypassed the timeout.  We could have
> waited it out,
> but 60 seconds is an eternity on stage.

I'm sorry, I didn't mean to sound offensive. Anyway I think that people should 
know that their drives can stuck the system for minutes, "despite" ZFS. I mean: 
there are a lot of writings about how ZFS is great for recovery in case a drive 
fails, but there's nothing regarding this problem. I know now it's not ZFS 
fault; but I wonder how many people set up their drives with ZFS assuming that 
"as soon as something goes bad, ZFS will fix it". 
Is there any way to test these cases other than smashing the drive with a 
hammer? Having a failover policy where the failover can't be tested sounds 
scary...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-24 Thread Scara Maccai
> In the worst case, the device would be selectable,
> but not responding
> to data requests which would lead through the device
> retry logic and can
> take minutes.

that's what I didn't know: that a driver could take minutes (hours???) to 
decide that a device is not working anymore.
Now it comes another question: how can one assume that a drive failure won't 
take one hour to be acknowledged by the driver? That is: what good is a 
failover strategy if it takes one hour to start? I'm grateful that the system 
doesn't write until it knows what is going on, but that can't take that long.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-24 Thread Scara Maccai
> if a disk vanishes like
> a sledgehammer
> hit it, ZFS will wait on the device driver to decide
> it's dead.

OK I see it.

> That said, there have been several threads about
> wanting configurable
> device timeouts handled at the ZFS level rather than
> the device driver
> level.  

Uh, so I can configure timeouts at the device level? I didn't know that.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-24 Thread Scara Maccai
> Why would it be assumed to be a bug in Solaris? Seems
> more likely on  
> balance to be a problem in the error reporting path
> or a controller/ 
> firmware weakness.

Weird: they would use a controller/firmware that doesn't work? Bad call...

> I'm pretty sure the first 2 versions of this demo I
> saw were executed  
> perfectly - and in a packed auditorium (Moscow? and
> Russians are the  
> toughest crowd). No smoke, no mirrors.

Still don't understand why even the one on http://www.opensolaris.com/, "ZFS – 
A Smashing Hit", doesn't show the app running in the moment the HD is 
smashed... weird...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] "ZFS, Smashing Baby" a fake???

2008-11-23 Thread Scara Maccai
I watched both the youtube video

http://www.youtube.com/watch?v=CN6iDzesEs0

and the one on http://www.opensolaris.com/, "ZFS – A Smashing Hit".

In the first one is obvious that the app stops working when they smash the 
drives; they have to physically detach the drive before the array 
reconstruction begins.
I'm not the only one that noticed it, comments on youtube:

"It appears that ZFS didn't recover after each drive failure until he unplugged 
the failed drive? Or was it coincidence that he unplugged the drive just as ZFS 
started recovering?" 
Reply 
"Yep. its a bug in solaris. BUt if you try and tell a sun person that, they get 
really pissy."

In the second video the focus is on the drive when the guy smashes it; I don't 
see any reasons why they would not let you see the app while he smashed the 
drive.
The focus comes back to the running app right after he detached the hard drive.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss