A little more information today.  I had a feeling that ZFS would continue quite 
some time before giving an error, and today I've shown that you can carry on 
working with the filesystem for at least half an hour with the disk removed.
 
I suspect on a system with little load you could carry on working for several 
hours without any indication that there is a problem.  It looks to me like ZFS 
is caching reads & writes, and that provided requests can be fulfilled from the 
cache, it doesn't care whether the disk is present or not.
 
I would guess that ZFS is attempting to write to the disk in the background, 
and that this is silently failing.
 
Here's the log of the tests I did today.  After removing the drive, over a 
period of 30 minutes I copied folders to the filesystem, created an archive, 
set permissions, and checked properties.  I did this both in the command line 
and with the graphical file manager tool in Solaris.  Neither reported any 
errors, and all the data could be read & written fine.  Until the reboot, at 
which point all the data was lost, again without error.
 
If you're not interested in the detail, please skip to the end where I've got 
some thoughts on just how many problems there are here.
 
 
# zpool status test  pool: test state: ONLINE scrub: none requestedconfig:
        NAME        STATE     READ WRITE CKSUM        test        ONLINE       
0     0     0          c2t7d0    ONLINE       0     0     0
errors: No known data errors# zfs list testNAME   USED  AVAIL  REFER  
MOUNTPOINTtest   243M   228G   242M  /test# zpool list testNAME   SIZE   USED  
AVAIL    CAP  HEALTH  ALTROOTtest   232G   243M   232G     0%  ONLINE  -
 
-- drive removed --
 
# cfgadm |grep sata1/7sata1/7                        sata-port    empty        
unconfigured ok
 
 
-- cfgadmin knows the drive is removed.  How come ZFS does not? --
 
# cp -r /rc-pool/copytest /test/copytest# zpool list testNAME      SIZE   USED  
AVAIL    CAP  HEALTH  ALTROOTtest      232G  73.4M   232G     0%  ONLINE  -# 
zfs list testNAME   USED  AVAIL  REFER  MOUNTPOINTtest   142K   228G    18K  
/test
 
 
-- Yup, still up.  Let's start the clock --
 
# dateTue Jul 29 09:31:33 BST 2008# du -hs /test/copytest 667K /test/copytest
 
 
-- 5 minutes later, still going strong --
 
# dateTue Jul 29 09:36:30 BST 2008# zpool list testNAME      SIZE   USED  AVAIL 
   CAP  HEALTH  ALTROOTtest      232G  73.4M   232G     0%  ONLINE  -# cp -r 
/rc-pool/copytest /test/copytest2# ls /testcopytest   copytest2# du -h -s /test 
1.3M /test# zpool list testNAME   SIZE   USED  AVAIL    CAP  HEALTH  
ALTROOTtest   232G  73.4M   232G     0%  ONLINE  -# find /test | wc -l          
                   2669# find //test/copytest | wc -l    1334# find 
/rc-pool/copytest | wc -l    1334# du -h -s /rc-pool/copytest 5.3M 
/rc-pool/copytest
 
 
-- Not sure why the original pool has 5.3MB of data when I use du. --
-- File Manager reports that they both have the same size --
 
 
-- 15 minutes later it's still working.  I can read data fine --
# dateTue Jul 29 09:43:04 BST 2008# chmod 777 /test/*# mkdir /rc-pool/test2# cp 
-r /test/copytest2 /rc-pool/test2/copytest2# find /rc-pool/test2/copytest2 | wc 
-l    1334# zpool list testNAME      SIZE   USED  AVAIL    CAP  HEALTH  
ALTROOTtest      232G  73.4M   232G     0%  ONLINE  -
 
 
-- and yup, the drive is still offline --
 
# cfgadm | grep sata1/7sata1/7                        sata-port    empty        
unconfigured ok
-- And finally, after 30 minutes the pool is still going strong --
 
# dateTue Jul 29 09:59:56 BST 2008
# tar -cf /test/copytest.tar /test/copytest/*# ls -ltotal 3drwxrwxrwx   3 root  
   root           3 Jul 29 09:30 copytest-rwxrwxrwx   1 root     root     
4626432 Jul 29 09:59 copytest.tardrwxrwxrwx   3 root     root           3 Jul 
29 09:39 copytest2# zpool list testNAME   SIZE   USED  AVAIL    CAP  HEALTH  
ALTROOTtest   232G  73.4M   232G     0%  ONLINE  -
 
After a full 30 minutes there's no indication whatsoever of any problem.  
Checking properties of the folder in File Browser reports 2665 items, totalling 
9.0MB.
 
At this point I tried "# zfs set sharesmb=on test".  I didn't really expect it 
to work, and sure enough, that command hung.  zpool status also hung, so I had 
to reboot the server.
 
 
-- Rebooted server --
 
 
Now I found that not only are all the files I've written in the last 30 minutes 
missing, but in fact files that I had deleted several minutes prior to removing 
the drive have re-appeared.
 
 
-- /test mount point is still present, I'll probably have to remove that 
manually --
 
 
# cd /# lsbin         export      media       proc        systemboot        
home        mnt         rc-pool     testdev         kernel      net         
rc-usb      tmpdevices     lib         opt         root        usretc         
lost+found  platform    sbin        var
 
 
-- ZFS still has the pool mounted, but at least now it realises it's not 
working --
 
 
# zpool listNAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOTrc-pool  2.27T  
52.6G  2.21T     2%  DEGRADED  -test         -      -      -      -  FAULTED  
-# zpool status test  pool: test state: UNAVAILstatus: One or more devices 
could not be opened.  There are insufficient replicas for the pool to continue 
functioning.action: Attach the missing device and online it using 'zpool 
online'.   see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requestedconfig:
 NAME        STATE     READ WRITE CKSUM test        UNAVAIL      0     0     0  
insufficient replicas   c2t7d0    UNAVAIL      0     0     0  cannot open
 
 
-- At least re-activating the pool is simple, but gotta love the "No known data 
errors" line --
 
# cfgadm -c configure sata1/7# zpool status test  pool: test state: ONLINE 
scrub: none requestedconfig:
 NAME        STATE     READ WRITE CKSUM test        ONLINE       0     0     0  
 c2t7d0    ONLINE       0     0     0
errors: No known data errors
 
 
-- But of course, although ZFS thinks it's online, it didn't mount properly --
 
# cd /test# ls# zpool export test# rm -r /test# zpool import test# cd test# 
lsvar (copy)  var2
 
 
-- Now that's unexpected.  Those folders should be long gone.  Let's see how 
many files ZFS failed to delete --
 
# du -h -s /test  77M /test# find /test | wc -l   19033
 
 
So in addition to working for a full half hour creating files, it's also failed 
to remove 77MB of data contained in nearly 20,000 files.  And it's done all 
that without reporting any error or problem with the pool.
 
In fact, if I didn't know what I was looking for, there would be no indication 
of a problem at all.  Before the reboot I can't find what's going on as "zfs 
status" hangs.  After the reboot it says there's no problem.  Both ZFS and it's 
troubleshooting tools fail in a big way here.  
 
As others have said, "zfs status" should not hang.  ZFS has to know the state 
of all the drives and pools it's currently using, "zfs status" should simply 
report the current known status from ZFS' internal state.  It shouldn't need to 
scan anything.  ZFS' internal state should also be checking with cfgadm so that 
it knows if a disk isn't there.  It should also be updated if the cache can't 
be flushed to disk, and "zfs list / zpool list" needs to borrow state 
information from the status commands so that they don't say 'online' when the 
pool has problems.
 
ZFS needs to deal more intelligently with mount points when a pool has 
problems.  Leaving the folder lying around in a way that prevents the pool 
mounting properly when the drives are recovered is not good.  When the pool 
appears to come back online without errors, it would be very easy for somebody 
to assume the data was lost from the pool without realising that it simply 
hasn't mounted and they're actually looking at an empty folder.  Firstly ZFS 
should be removing the mount point when problems occur, and secondly, ZFS list 
or ZFS status should include information to inform you that the pool could not 
be mounted properly.
 
ZFS status really should be warning of any ZFS errors that occur.  Including 
things like being unable to mount the pool, CIFS mounts failing, etc...
 
And finally, if ZFS does find problems writing from the cache, it really needs 
to log somewhere the names of all the files affected, and the action that could 
not be carried out.  ZFS knows the files it was meant to delete here, it also 
knows the files that were written.  I can accept that with delayed writes files 
may occasionally be lost when a failure happens, but I don't accept that we 
need to loose all knowledge of the affected files when the filesystem has 
complete knowledge of what is affected.  If there are any working filesystems 
on the server, ZFS should make an attempt to store a log of the problem, 
failing that it should e-mail the data out.  The admin really needs to know 
what files have been affected so that they can notify users of the data loss.  
I don't know where you would store this information, but wherever that is, 
"zpool status" should be reporting the error and directing the admin to the log 
file.
 
I would probably say this could be safely stored on the system drive.  Would it 
be possible to have a number of possible places to store this log?  What I'm 
thinking is that if the system drive is unavailable, ZFS could try each pool in 
turn and attempt to store the log there.
 
In fact e-mail alerts or external error logging would be a great addition to 
ZFS.  Surely it makes sense that filesystem errors would be better off being 
stored and handled externally?
 
Ross
 
> Date: Mon, 28 Jul 2008 12:28:34 -0700> From: [EMAIL PROTECTED]> Subject: Re: 
> [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed> To: [EMAIL 
> PROTECTED]> > I'm trying to reproduce and will let you know what I find.> -- 
> richard> 
_________________________________________________________________
The John Lewis Clearance - save up to 50% with FREE delivery
http://clk.atdmt.com/UKM/go/101719806/direct/01/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to