I think the important point here is that this makes the case for ZFS 
handling at least one layer of redundancy.  If the disk you pulled was 
part of a mirror or raidz, there wouldn't be data loss when your system 
was rebooted.  In fact, the zpool status commands would likely keep 
working, and a reboot wouldn't be necessary at all.  I think it's 
unreasonable to expect a system with any file system to recover from a 
single drive being pulled.  Of course, loosing extra work because of the 
delayed notification is bad, but none the less, this is not a reasonable 
test.  Basically, always provide redundancy in your zpool config.

Jon

Ross Smith wrote:
> A little more information today.  I had a feeling that ZFS would 
> continue quite some time before giving an error, and today I've shown 
> that you can carry on working with the filesystem for at least half an 
> hour with the disk removed.
>  
> I suspect on a system with little load you could carry on working for 
> several hours without any indication that there is a problem.  It 
> looks to me like ZFS is caching reads & writes, and that provided 
> requests can be fulfilled from the cache, it doesn't care whether the 
> disk is present or not.
>  
> I would guess that ZFS is attempting to write to the disk in the 
> background, and that this is silently failing.
>  
> Here's the log of the tests I did today.  After removing the drive, 
> over a period of 30 minutes I copied folders to the filesystem, 
> created an archive, set permissions, and checked properties.  I did 
> this both in the command line and with the graphical file manager tool 
> in Solaris.  Neither reported any errors, and all the data could be 
> read & written fine.  Until the reboot, at which point all the data 
> was lost, again without error.
>  
> If you're not interested in the detail, please skip to the end where 
> I've got some thoughts on just how many problems there are here.
>  
>  
> # zpool status test
>   pool: test
>  state: ONLINE
>  scrub: none requested
> config:
>         NAME        STATE     READ WRITE CKSUM
>         test        ONLINE       0     0     0
>           c2t7d0    ONLINE       0     0     0
> errors: No known data errors
> # zfs list test
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> test   243M   228G   242M  /test
> # zpool list test
> NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test   232G   243M   232G     0%  ONLINE  -
>  
>
> -- drive removed --
>  
>
> # cfgadm |grep sata1/7
> sata1/7                        sata-port    empty        unconfigured ok
>  
>  
> -- cfgadmin knows the drive is removed.  How come ZFS does not? --
>  
>
> # cp -r /rc-pool/copytest /test/copytest
> # zpool list test
> NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test      232G  73.4M   232G     0%  ONLINE  -
> # zfs list test
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> test   142K   228G    18K  /test
>  
>  
> -- Yup, still up.  Let's start the clock --
>  
>
> # date
> Tue Jul 29 09:31:33 BST 2008
> # du -hs /test/copytest
>  667K /test/copytest
>  
>  
> -- 5 minutes later, still going strong --
>  
>
> # date
> Tue Jul 29 09:36:30 BST 2008
> # zpool list test
> NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test      232G  73.4M   232G     0%  ONLINE  -
> # cp -r /rc-pool/copytest /test/copytest2
> # ls /test
> copytest   copytest2
> # du -h -s /test
>  1.3M /test
> # zpool list test
> NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test   232G  73.4M   232G     0%  ONLINE  -
> # find /test | wc -l                        
>     2669
> # find //test/copytest | wc -l
>     1334
> # find /rc-pool/copytest | wc -l
>     1334
> # du -h -s /rc-pool/copytest
>  5.3M /rc-pool/copytest
>  
>  
> -- Not sure why the original pool has 5.3MB of data when I use du. --
> -- File Manager reports that they both have the same size --
>  
>  
> -- 15 minutes later it's still working.  I can read data fine --
>
> # date
> Tue Jul 29 09:43:04 BST 2008
> # chmod 777 /test/*
> # mkdir /rc-pool/test2
> # cp -r /test/copytest2 /rc-pool/test2/copytest2
> # find /rc-pool/test2/copytest2 | wc -l
>     1334
> # zpool list test
> NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test      232G  73.4M   232G     0%  ONLINE  -
>  
>  
> -- and yup, the drive is still offline --
>  
>
> # cfgadm | grep sata1/7
> sata1/7                        sata-port    empty        unconfigured ok
>
>
> -- And finally, after 30 minutes the pool is still going strong --
>  
>
> # date
> Tue Jul 29 09:59:56 BST 2008
> # tar -cf /test/copytest.tar /test/copytest/*
> # ls -l
> total 3
> drwxrwxrwx   3 root     root           3 Jul 29 09:30 copytest
> -rwxrwxrwx   1 root     root     4626432 Jul 29 09:59 copytest.tar
> drwxrwxrwx   3 root     root           3 Jul 29 09:39 copytest2
> # zpool list test
> NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> test   232G  73.4M   232G     0%  ONLINE  -
>
>  
> After a full 30 minutes there's no indication whatsoever of any 
> problem.  Checking properties of the folder in File Browser reports 
> 2665 items, totalling 9.0MB.
>  
> At this point I tried "# zfs set sharesmb=on test".  I didn't really 
> expect it to work, and sure enough, that command hung.  zpool status 
> also hung, so I had to reboot the server.
>  
>  
> -- Rebooted server --
>  
>  
> Now I found that not only are all the files I've written in the last 
> 30 minutes missing, but in fact files that I had deleted several 
> minutes prior to removing the drive have re-appeared.
>  
>  
> -- /test mount point is still present, I'll probably have to remove 
> that manually --
>  
>  
> # cd /
> # ls
> bin         export      media       proc        system
> boot        home        mnt         rc-pool     test
> dev         kernel      net         rc-usb      tmp
> devices     lib         opt         root        usr
> etc         lost+found  platform    sbin        var
>  
>  
> -- ZFS still has the pool mounted, but at least now it realises it's 
> not working --
>  
>  
> # zpool list
> NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> rc-pool  2.27T  52.6G  2.21T     2%  DEGRADED  -
> test         -      -      -      -  FAULTED  -
> # zpool status test
>   pool: test
>  state: UNAVAIL
> status: One or more devices could not be opened.  There are insufficient
>  replicas for the pool to continue functioning.
> action: Attach the missing device and online it using 'zpool online'.
>    see: http://www.sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>  NAME        STATE     READ WRITE CKSUM
>  test        UNAVAIL      0     0     0  insufficient replicas
>    c2t7d0    UNAVAIL      0     0     0  cannot open
>  
>  
> -- At least re-activating the pool is simple, but gotta love the "No 
> known data errors" line --
>  
>
> # cfgadm -c configure sata1/7
> # zpool status test
>   pool: test
>  state: ONLINE
>  scrub: none requested
> config:
>  NAME        STATE     READ WRITE CKSUM
>  test        ONLINE       0     0     0
>    c2t7d0    ONLINE       0     0     0
> errors: No known data errors
>  
>  
> -- But of course, although ZFS thinks it's online, it didn't mount 
> properly --
>  
>
> # cd /test
> # ls
> # zpool export test
> # rm -r /test
> # zpool import test
> # cd test
> # ls
> var (copy)  var2
>  
>  
> -- Now that's unexpected.  Those folders should be long gone.  Let's 
> see how many files ZFS failed to delete --
>  
>
> # du -h -s /test
>   77M /test
> # find /test | wc -l
>    19033
>  
>  
> So in addition to working for a full half hour creating files, it's 
> also failed to remove 77MB of data contained in nearly 20,000 files.  
> And it's done all that without reporting any error or problem with the 
> pool.
>  
> In fact, if I didn't know what I was looking for, there would be no 
> indication of a problem at all.  Before the reboot I can't find what's 
> going on as "zfs status" hangs.  After the reboot it says there's no 
> problem.  Both ZFS and it's troubleshooting tools fail in a big way 
> here. 
>  
> As others have said, "zfs status" should not hang.  ZFS has to know 
> the state of all the drives and pools it's currently using, "zfs 
> status" should simply report the current known status from ZFS' 
> internal state.  It shouldn't need to scan anything.  ZFS' internal 
> state should also be checking with cfgadm so that it knows if a disk 
> isn't there.  It should also be updated if the cache can't be flushed 
> to disk, and "zfs list / zpool list" needs to borrow state information 
> from the status commands so that they don't say 'online' when the pool 
> has problems.
>  
> ZFS needs to deal more intelligently with mount points when a pool has 
> problems.  Leaving the folder lying around in a way that prevents the 
> pool mounting properly when the drives are recovered is not good.  
> When the pool appears to come back online without errors, it would be 
> very easy for somebody to assume the data was lost from the pool 
> without realising that it simply hasn't mounted and they're actually 
> looking at an empty folder.  Firstly ZFS should be removing the mount 
> point when problems occur, and secondly, ZFS list or ZFS status should 
> include information to inform you that the pool could not be mounted 
> properly.
>  
> ZFS status really should be warning of any ZFS errors that occur.  
> Including things like being unable to mount the pool, CIFS mounts 
> failing, etc...
>  
> And finally, if ZFS does find problems writing from the cache, it 
> really needs to log somewhere the names of all the files affected, and 
> the action that could not be carried out.  ZFS knows the files it was 
> meant to delete here, it also knows the files that were written.  I 
> can accept that with delayed writes files may occasionally be lost 
> when a failure happens, but I don't accept that we need to loose all 
> knowledge of the affected files when the filesystem has complete 
> knowledge of what is affected.  If there are any working filesystems 
> on the server, ZFS should make an attempt to store a log of the 
> problem, failing that it should e-mail the data out.  The admin really 
> needs to know what files have been affected so that they can notify 
> users of the data loss.  I don't know where you would store this 
> information, but wherever that is, "zpool status" should be reporting 
> the error and directing the admin to the log file.
>  
> I would probably say this could be safely stored on the system drive.  
> Would it be possible to have a number of possible places to store this 
> log?  What I'm thinking is that if the system drive is unavailable, 
> ZFS could try each pool in turn and attempt to store the log there.
>  
> In fact e-mail alerts or external error logging would be a great 
> addition to ZFS.  Surely it makes sense that filesystem errors would 
> be better off being stored and handled externally?
>  
> Ross
>  
>
>
> > Date: Mon, 28 Jul 2008 12:28:34 -0700
> > From: [EMAIL PROTECTED]
> > Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive 
> removed
> > To: [EMAIL PROTECTED]
> >
> > I'm trying to reproduce and will let you know what I find.
> > -- richard
> >
>
>
> ------------------------------------------------------------------------
> Win £3000 to spend on whatever you want at Uni! Click here to WIN! 
> <http://clk.atdmt.com/UKM/go/101719803/direct/01/>
> ------------------------------------------------------------------------
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

-- 


-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 [EMAIL PROTECTED]
- ______/    ______/    ______/           AST:7731^29u18e3
                                 


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to