Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
re == Richard Elling [EMAIL PROTECTED] writes: pf == Paul Fisher [EMAIL PROTECTED] writes: re I was able to reproduce this in b93, but might have a re different interpretation You weren't able to reproduce the hang of 'zpool status'? Your 'zpool status' was after the FMA fault kicked in, though. How about before FMA decided to mark the pool faulted---did 'zpool status' hang, or work? If it worked, what did it report? The 'zpool status' hanging happens for me on b71 when an iSCSI target goes away. (IIRC 'iscsiadm remove discovery-address ...' unwedges zpool status for me, but my notes could be more careful.) re However, the default failmode property is set to wait which re will patiently wait forever. If you would rather have the I/O re fail, then you should change the failmode to continue for him, it sounds like it's not doing either. I think he does not have the failmode property, since it is so new? It sounds like 'continue' should return I/O errors sooner than 9 minutes after the unredundant disks generate them (but not at all for degraded redundant pools of course). And it sounds like 'wait' should block the writing program, forever if necessary, like an NFS hard mount. (1) Is the latter what 'wait' actually did for you? Or did the writing process get I/O errors after the 9-minutes-later FMA diagnosis? (2) is it like NFS 'hard' or is it like 'hard,intr'? :) It's great to see these things improving. pf Wow! Who knew that 17, 951 was the magic number... Seriously, pf this does seem like an excessive amount of certainty. I agree it's an awfully forgiving constant, so big that it sounds like it might not be a constant manually set to 16384 or something, but rather an accident. I'm surprised to find FMA is responsible for deciding the length of this 9-minute (or more, for Ross) delay. note that, if the false positives one is trying to filter out are things like USB/SAN cabling spasms and drive recalibrations, the right metric is time, not number of failed CDB's. The hugely-delayed response may be a blessing in disguise though, because arranging for the differnet FMA states to each last tens of minutes means it's possible to evaluate the system's behavior in each state, to see if it's correct. For example, within this 9-minute window: * what does 'zpool status' say before the FMA faulting * what do applications experience, ex., + is it possible to get an I/O error during this window with failmode=wait? how about with failmode=continue? + are reads and writes that block interruptible or uninterruptible? + What about fsync()? o what about fsync() if there is a slog? * is the system stable or are there ``lazy panic'' cases? + what if you ``ask for it'' by calling 'zpool clear' or 'zpool scrub' within the 9-minute window? * are other pools that don't include failed devices affected (for reading/writing. but, also, if 'zpool status' is frozen for all pools, then other pools are affected.) * probably other stuff... God willing some day some of the states can be shortened to values more like 1 second or 1 minute, or really aggressive variance-and-average-based threshholds like TCP timers, so that FMA is actually useful rather than a step backwards from SVM as it seems to me right now. The NetApp paper Richard posted earlier was saying NetApp never waits the 30 seconds for an ATAPI error, they just ignore the disk if it doesn't answer within 1000ms or so. But my crappy Linux iSCSI targets would probably miss 1000ms timeouts all the time just because they're heavily loaded---you could get pools that go FAULTED whenever they get heavy use. so some of FMA's states maybe should be short, but they're harder to observe when they're so short. The point of FMA, AIUI, is to make the failure state machine really complicated. We want it complicated to deal with both netapp's good example of aggressive timers and also deal with my crappy Linux IET setup, so increasingly hairy rules can be written with experience. Complicated means that observing each state is important to verify the complicated system's correctness. And observing means they can't be 1 second long even if that's the appropriate length. But I don't know if that's really the developer's intent, or just my dreaming and hoping. pgpAT0ZOB5awi.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Question embedded below... Richard Elling wrote: ... If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll see words to the effect that, The pool has experienced I/O failures. Since the ZFS pool property 'failmode' is set to 'wait', all I/Os (reads and writes) are blocked. See the zpool(1M) manpage for more information on the 'failmode' property. Manual intervention is required for I/Os to be serviced. I would guess that ZFS is attempting to write to the disk in the background, and that this is silently failing. It is clearly not silently failing. However, the default failmode property is set to wait which will patiently wait forever. If you would rather have the I/O fail, then you should change the failmode to continue I would not normally recommend a failmode of panic Hi Richard, Does failmode==wait cause ZFS itself to retry i/o, that is, to retry an i/o where an earlier request (of that same i/o) returned from the driver with an error? If so, that will compound timeouts even further. I'm also confused by your statement that wait means wait forever, given that the actual circumstances here are that zfs (and the rest of the i/o stack) returned after 9 minutes. thanks, Andy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Hi Andy, answer pointer below... Andrew Hisgen wrote: Question embedded below... Richard Elling wrote: ... If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll see words to the effect that, The pool has experienced I/O failures. Since the ZFS pool property 'failmode' is set to 'wait', all I/Os (reads and writes) are blocked. See the zpool(1M) manpage for more information on the 'failmode' property. Manual intervention is required for I/Os to be serviced. I would guess that ZFS is attempting to write to the disk in the background, and that this is silently failing. It is clearly not silently failing. However, the default failmode property is set to wait which will patiently wait forever. If you would rather have the I/O fail, then you should change the failmode to continue I would not normally recommend a failmode of panic Hi Richard, Does failmode==wait cause ZFS itself to retry i/o, that is, to retry an i/o where an earlier request (of that same i/o) returned from the driver with an error? If so, that will compound timeouts even further. I'm also confused by your statement that wait means wait forever, given that the actual circumstances here are that zfs (and the rest of the i/o stack) returned after 9 minutes. The details are in PSARC/2007/567. Externally available at: http://www.opensolaris.org/os/community/arc/caselog/2007/567/ With failmode=wait, I/Os will wait until manual intervention which is shown as an administrator running zpool clear on the affected pool. I see the need for a document to help people work through these cases as they can be complex at many different levels. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
0K 0K 0%/dev/fdswap 4.7G48K 4.7G 1% /tmpswap 4.7G76K 4.7G 1% /var/run/dev/dsk/c1t0d0s7 425G 4.8G 416G 2%/export/home 6. 10:35am It's now been two hours, neither zpool status nor zfs list have ever finished. The file copy attempt has also been hung for over an hour (although that's not unexpected with 'wait' as the failmode). Richard, you say ZFS is not silently failing, well for me it appears that it is. I can't see any warnings from ZFS, I can't get any status information. I see no way that I could find out what files are going to be lost on this server. Yes, I'm now aware that the pool has hung since file operations are hanging, however had that been my first indication of a problem I believe I am now left in a position where I cannot find out either the cause, nor the files affected. I don't believe I have any way to find out which operations had completed without error, but are not currently committed to disk. I certainly don't get the status message you do saying permanent errors have been found in files. I plugged the USB drive back in now, Solaris detected it ok, but ZFS is still hung. The rest of /var/adm/messages is: Jul 31 09:39:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 09:45:22 unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, ignoredJul 31 09:45:38 unknown last message repeated 5 timesJul 31 09:51:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 10:03:44 unknown last message repeated 2 timesJul 31 10:14:27 unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, ignoredJul 31 10:14:45 unknown last message repeated 5 timesJul 31 10:15:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 10:27:45 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packet Jul 31 10:36:25 unknown usba: [ID 691482 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED],1/[EMAIL PROTECTED] (scsa2usb0): Reinserted device is accessible again.Jul 31 10:39:45 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 10:45:53 unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, ignoredJul 31 10:46:09 unknown last message repeated 5 timesJul 31 10:51:45 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packet 7. 10:55am Gave up on ZFS ever recovering. A shutdown attempt hung as expected. I hard-reset the computer. Ross Date: Wed, 30 Jul 2008 11:17:08 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org I was able to reproduce this in b93, but might have a different interpretation of the conditions. More below... Ross Smith wrote: A little more information today. I had a feeling that ZFS would continue quite some time before giving an error, and today I've shown that you can carry on working with the filesystem for at least half an hour with the disk removed.I suspect on a system with little load you could carry on working for several hours without any indication that there is a problem. It looks to me like ZFS is caching reads writes, and that provided requests can be fulfilled from the cache, it doesn't care whether the disk is present or not. In my USB-flash-disk-sudden-removal-while-writing-big-file-test, 1. I/O to the missing device stopped (as I expected) 2. FMA kicked in, as expected. 3. /var/adm/messages recorded Command failed to complete... device gone. 4. After exactly 9 minutes, 17,951 e-reports had been processed and the diagnosis was complete. FMA logged the following to /var/adm/messages Jul 30 10:33:44 grond scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd1): Jul 30 10:33:44 grond Command failed to complete...Device is gone Jul 30 10:42:31 grond fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT 2008 Jul 30 10:42:31 grond PLATFORM: , CSN: , HOSTNAME: grond Jul 30 10:42:31 grond SOURCE: zfs-diagnosis, REV: 1.0 Jul 30 10:42:31 grond EVENT-ID: d99769aa-28e8-cf16-d181-945592130525 Jul 30 10:42:31 grond DESC: The number of I/O errors associated with a ZFS device exceeded Jul 30 10:42:31 grond acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jul 30 10:42:31 grond AUTO-RESPONSE: The device has been offlined and marked
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Well yeah, this is obviously not a valid setup for my data, but if you read my first e-mail, the whole point of this test was that I had seen Solaris hang when a drive was removed from a fully redundant array (five sets of three way mirrors), and wanted to see what was going on. So I started with the most basic pool I could to see how ZFS and Solaris actually reacted to a drive being removed. I was fully expecting ZFS to simply error when the drive was removed, and move the test on to move complex pools. I did not expect to find so many problems with such a simple setup. And the problems I have found also lead to potential data loss in a redundant array, although it would have been much more difficult to spot: Imagine you had a raid-z array and pulled a drive as I'm doing here. Because ZFS isn't aware of the removal it keeps writing to that drive as if it's valid. That means ZFS still believes the array is online when in fact it should be degrated. If any other drive now fails, ZFS will consider the status degrated instead of faulted, and will continue writing data. The problem is, ZFS is writing some of that data to a drive which doesn't exist, meaning all that data will be lost on reboot. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
On Wed, 30 Jul 2008, Ross wrote: Imagine you had a raid-z array and pulled a drive as I'm doing here. Because ZFS isn't aware of the removal it keeps writing to that drive as if it's valid. That means ZFS still believes the array is online when in fact it should be degrated. If any other drive now fails, ZFS will consider the status degrated instead of faulted, and will continue writing data. The problem is, ZFS is writing some of that data to a drive which doesn't exist, meaning all that data will be lost on reboot. While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
I agree that device drivers should perform the bulk of the fault monitoring, however I disagree that this absolves ZFS of any responsibility for checking for errors. The primary goal of ZFS is to be a filesystem and maintain data integrity, and that entails both reading and writing data to the devices. It is no good having checksumming when reading data if you are loosing huge amounts of data when a disk fails. I'm not saying that ZFS should be monitoring disks and drivers to ensure they are working, just that if ZFS attempts to write data and doesn't get the response it's expecting, an error should be logged against the device regardless of what the driver says. If ZFS is really about end-to-end data integrity, then you do need to consider the possibility of a faulty driver. Now I don't know what the root cause of this error is, but I suspect it will be either a bad response from the SATA driver, or something within ZFS that is not working correctly. Either way however I believe ZFS should have caught this. It's similar to the iSCSI problem I posted a few months back where the ZFS pool hangs for 3 minutes when a device is disconnected. There's absolutely no need for the entire pool to hang when the other half of the mirror is working fine. ZFS is often compared to hardware raid controllers, but so far it's ability to handle problems is falling short. Ross Date: Wed, 30 Jul 2008 09:48:34 -0500 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed On Wed, 30 Jul 2008, Ross wrote: Imagine you had a raid-z array and pulled a drive as I'm doing here. Because ZFS isn't aware of the removal it keeps writing to that drive as if it's valid. That means ZFS still believes the array is online when in fact it should be degrated. If any other drive now fails, ZFS will consider the status degrated instead of faulted, and will continue writing data. The problem is, ZFS is writing some of that data to a drive which doesn't exist, meaning all that data will be lost on reboot. While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _ Find the best and worst places on the planet http://clk.atdmt.com/UKM/go/101719807/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
On Wed, 30 Jul 2008, Ross Smith wrote: I'm not saying that ZFS should be monitoring disks and drivers to ensure they are working, just that if ZFS attempts to write data and doesn't get the response it's expecting, an error should be logged against the device regardless of what the driver says. If ZFS is A few things to consider: * Maybe the device driver has not yet reported (or fails to report) and error and just seems slow. * ZFS is at such a high level that in many cases it has no useful knowledge of actual devices. For example, MPXIO (multipath) may be layered on top, or maybe an ethernet network is involved. If ZFS experiences a temporary problem with reaching a device, does that mean the device has failed, or does it perhaps indicate that a path is temporarily slow? If one device is a local disk and the other device is accessed via iSCSI and is located on the other end of the country, should ZFS refuse to operate if the remote disk is slow or stops responding for several minutes? This would be a typical situation when using mirroring, and one mirror device is remote. The parameters that a device driver for a local device uses to decide if there is a fault will be (and should be) substantially different than the parameters for a remote device. That is why most responsibility is left to the device driver. ZFS will behave according to how the device driver behaves. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Your point is well taken that ZFS should not duplicate functionality that is already or should be available at the device driver level.In this case, I think it misses the point of what ZFS should be doing that it is not. ZFS does its own periodic commits to the disk, and it knows if those commit points have reached the disk or not, or whether they are getting errors.In this particular case, those commits to disk are presumably failing, because one of the disks they depend on has been removed from the system. (If the writes are not being marked as failures, that would definitely be an error in the device driver, as you say.) In this case, however, the ZIL log has stopped being updated, but ZFS does nothing to announce that this has happened, or to indicate that a remedy is required. At the very least, it would be extremely helpful if ZFS had a status to report that indicates that the ZIL log is out of date, or that there are troubles writing to the ZIL log, or something like that. An additional feature would be to have user-selectable behavior when the ZIL log is significantly out of date.For example, if the ZIL log is more than X seconds out of date, then new writes to the system should pause, or give errors or continue to silently succeed. In an earlier phase of my career when I worked for a database company, I was responsible for a similar bug. It caused a major customer to lose a major amount of data when a system rebooted when not all good data had been successfully committed to disk.The resulting stink caused us to add a feature to detect the cases when the writing-to-disk process had fallen too far behind, and to pause new writes to the database until the situation was resolved. Peter Bob Friesenhahn wrote: While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
I was able to reproduce this in b93, but might have a different interpretation of the conditions. More below... Ross Smith wrote: A little more information today. I had a feeling that ZFS would continue quite some time before giving an error, and today I've shown that you can carry on working with the filesystem for at least half an hour with the disk removed. I suspect on a system with little load you could carry on working for several hours without any indication that there is a problem. It looks to me like ZFS is caching reads writes, and that provided requests can be fulfilled from the cache, it doesn't care whether the disk is present or not. In my USB-flash-disk-sudden-removal-while-writing-big-file-test, 1. I/O to the missing device stopped (as I expected) 2. FMA kicked in, as expected. 3. /var/adm/messages recorded Command failed to complete... device gone. 4. After exactly 9 minutes, 17,951 e-reports had been processed and the diagnosis was complete. FMA logged the following to /var/adm/messages Jul 30 10:33:44 grond scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd1): Jul 30 10:33:44 grond Command failed to complete...Device is gone Jul 30 10:42:31 grond fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT 2008 Jul 30 10:42:31 grond PLATFORM: , CSN: , HOSTNAME: grond Jul 30 10:42:31 grond SOURCE: zfs-diagnosis, REV: 1.0 Jul 30 10:42:31 grond EVENT-ID: d99769aa-28e8-cf16-d181-945592130525 Jul 30 10:42:31 grond DESC: The number of I/O errors associated with a ZFS device exceeded Jul 30 10:42:31 grond acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jul 30 10:42:31 grond AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jul 30 10:42:31 grond will be made to activate a hot spare if available. Jul 30 10:42:31 grond IMPACT: Fault tolerance of the pool may be compromised. Jul 30 10:42:31 grond REC-ACTION: Run 'zpool status -x' and replace the bad device. The above URL shows what you expect, but more (and better) info is available from zpool status -xv pool: rmtestpool state: UNAVAIL status: One or more devices are faultd in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAMESTATE READ WRITE CKSUM rmtestpool UNAVAIL 0 15.7K 0 insufficient replicas c2t0d0p0 FAULTED 0 15.7K 0 experienced I/O failures errors: Permanent errors have been detected in the following files: /rmtestpool/random.data If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll see words to the effect that, The pool has experienced I/O failures. Since the ZFS pool property 'failmode' is set to 'wait', all I/Os (reads and writes) are blocked. See the zpool(1M) manpage for more information on the 'failmode' property. Manual intervention is required for I/Os to be serviced. I would guess that ZFS is attempting to write to the disk in the background, and that this is silently failing. It is clearly not silently failing. However, the default failmode property is set to wait which will patiently wait forever. If you would rather have the I/O fail, then you should change the failmode to continue I would not normally recommend a failmode of panic Now to figure out how to recover gracefully... zpool clear isn't happy... [sidebar] while performing this experiment, I noticed that fmd was checkpointing the diagnosis engine to disk in the /var/fm/fmd/ckpt/zfs-diagnosis directory. If this had been the boot disk, with failmode=wait, I'm not convinced that we'd get a complete diagnosis... I'll explore that later. [/sidebar] -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Richard Elling wrote: I was able to reproduce this in b93, but might have a different interpretation of the conditions. More below... Ross Smith wrote: A little more information today. I had a feeling that ZFS would continue quite some time before giving an error, and today I've shown that you can carry on working with the filesystem for at least half an hour with the disk removed. I suspect on a system with little load you could carry on working for several hours without any indication that there is a problem. It looks to me like ZFS is caching reads writes, and that provided requests can be fulfilled from the cache, it doesn't care whether the disk is present or not. In my USB-flash-disk-sudden-removal-while-writing-big-file-test, 1. I/O to the missing device stopped (as I expected) 2. FMA kicked in, as expected. 3. /var/adm/messages recorded Command failed to complete... device gone. 4. After exactly 9 minutes, 17,951 e-reports had been processed and the diagnosis was complete. FMA logged the following to /var/adm/messages Wow! Who knew that 17, 951 was the magic number... Seriously, this does seem like an excessive amount of certainty. -- paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Peter Cudhea wrote: Your point is well taken that ZFS should not duplicate functionality that is already or should be available at the device driver level.In this case, I think it misses the point of what ZFS should be doing that it is not. ZFS does its own periodic commits to the disk, and it knows if those commit points have reached the disk or not, or whether they are getting errors.In this particular case, those commits to disk are presumably failing, because one of the disks they depend on has been removed from the system. (If the writes are not being marked as failures, that would definitely be an error in the device driver, as you say.) In this case, however, the ZIL log has stopped being updated, but ZFS does nothing to announce that this has happened, or to indicate that a remedy is required. I think you have some misconceptions about how the ZIL works. It doesn't provide journalling like UFS. The following might help: http://blogs.sun.com/perrin/entry/the_lumberjack The ZIL isn't used at all unless there's fsync/O_DSYNC activity. At the very least, it would be extremely helpful if ZFS had a status to report that indicates that the ZIL log is out of date, or that there are troubles writing to the ZIL log, or something like that. If the ZIL cannot be written then we force a transaction group (txg) commit. That is the only recourse to force data to stable storage before returning to the application. An additional feature would be to have user-selectable behavior when the ZIL log is significantly out of date.For example, if the ZIL log is more than X seconds out of date, then new writes to the system should pause, or give errors or continue to silently succeed. Again this doesn't make sense given how the ZIL works. In an earlier phase of my career when I worked for a database company, I was responsible for a similar bug. It caused a major customer to lose a major amount of data when a system rebooted when not all good data had been successfully committed to disk.The resulting stink caused us to add a feature to detect the cases when the writing-to-disk process had fallen too far behind, and to pause new writes to the database until the situation was resolved. Peter Bob Friesenhahn wrote: While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Thanks, this is helpful. I was definitely misunderstanding the part that the ZIL plays in ZFS. I found Richard Elling's discussion of the FMA response to the failure very informative. I see how the device driver, the fault analysis layer and the ZFS layer are all working together.Though the customer's complaint that the change in state from working to not working is taking too long seems pretty valid. Peter Neil Perrin wrote: Peter Cudhea wrote: Your point is well taken that ZFS should not duplicate functionality that is already or should be available at the device driver level. In this case, I think it misses the point of what ZFS should be doing that it is not. ZFS does its own periodic commits to the disk, and it knows if those commit points have reached the disk or not, or whether they are getting errors.In this particular case, those commits to disk are presumably failing, because one of the disks they depend on has been removed from the system. (If the writes are not being marked as failures, that would definitely be an error in the device driver, as you say.) In this case, however, the ZIL log has stopped being updated, but ZFS does nothing to announce that this has happened, or to indicate that a remedy is required. I think you have some misconceptions about how the ZIL works. It doesn't provide journalling like UFS. The following might help: http://blogs.sun.com/perrin/entry/the_lumberjack The ZIL isn't used at all unless there's fsync/O_DSYNC activity. At the very least, it would be extremely helpful if ZFS had a status to report that indicates that the ZIL log is out of date, or that there are troubles writing to the ZIL log, or something like that. If the ZIL cannot be written then we force a transaction group (txg) commit. That is the only recourse to force data to stable storage before returning to the application. An additional feature would be to have user-selectable behavior when the ZIL log is significantly out of date.For example, if the ZIL log is more than X seconds out of date, then new writes to the system should pause, or give errors or continue to silently succeed. Again this doesn't make sense given how the ZIL works. In an earlier phase of my career when I worked for a database company, I was responsible for a similar bug. It caused a major customer to lose a major amount of data when a system rebooted when not all good data had been successfully committed to disk.The resulting stink caused us to add a feature to detect the cases when the writing-to-disk process had fallen too far behind, and to pause new writes to the database until the situation was resolved. Peter Bob Friesenhahn wrote: While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Peter Cudhea wrote: Thanks, this is helpful. I was definitely misunderstanding the part that the ZIL plays in ZFS. I found Richard Elling's discussion of the FMA response to the failure very informative. I see how the device driver, the fault analysis layer and the ZFS layer are all working together.Though the customer's complaint that the change in state from working to not working is taking too long seems pretty valid. I wish there was a simple answer to the can-of-worms^TM that this question opens. But there really isn't. As Paul Fisher points out, logging 17,951 e-reports in 9 minutes seems like a lot, but I'm quite sure that is CPU bound and I could log more with a faster system :-) The key here is that 9 minutes represents some combination of timeouts in the sd/scsa2usb/usb stack. The myth of layered software says that timeouts compound, so digging around for a better collection might or might not be generally satisfying. Since this is not a ZFS timeout, perhaps the conversation should be continued in a more appropriate forum? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
From a reporting perspective, yes, zpool status should not hang, and should report an error if a drive goes away, or is in any way behaving badly. No arguments there. From the data integrity perspective, the only event zfs needs to know about is when a bad drive is replaced, such that a resilver is triggered. If a drive is suddenly gone, but it is only one component of a redundant set, your data should still be fine. Now, if enough drives go away to break the redundancy, that's a different story altogether. Jon Ross Smith wrote: I agree that device drivers should perform the bulk of the fault monitoring, however I disagree that this absolves ZFS of any responsibility for checking for errors. The primary goal of ZFS is to be a filesystem and maintain data integrity, and that entails both reading and writing data to the devices. It is no good having checksumming when reading data if you are loosing huge amounts of data when a disk fails. I'm not saying that ZFS should be monitoring disks and drivers to ensure they are working, just that if ZFS attempts to write data and doesn't get the response it's expecting, an error should be logged against the device regardless of what the driver says. If ZFS is really about end-to-end data integrity, then you do need to consider the possibility of a faulty driver. Now I don't know what the root cause of this error is, but I suspect it will be either a bad response from the SATA driver, or something within ZFS that is not working correctly. Either way however I believe ZFS should have caught this. It's similar to the iSCSI problem I posted a few months back where the ZFS pool hangs for 3 minutes when a device is disconnected. There's absolutely no need for the entire pool to hang when the other half of the mirror is working fine. ZFS is often compared to hardware raid controllers, but so far it's ability to handle problems is falling short. Ross Date: Wed, 30 Jul 2008 09:48:34 -0500 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed On Wed, 30 Jul 2008, Ross wrote: Imagine you had a raid-z array and pulled a drive as I'm doing here. Because ZFS isn't aware of the removal it keeps writing to that drive as if it's valid. That means ZFS still believes the array is online when in fact it should be degrated. If any other drive now fails, ZFS will consider the status degrated instead of faulted, and will continue writing data. The problem is, ZFS is writing some of that data to a drive which doesn't exist, meaning all that data will be lost on reboot. While I do believe that device drivers. or the fault system, should notify ZFS when a device fails (and ZFS should appropriately react), I don't think that ZFS should be responsible for fault monitoring. ZFS is in a rather poor position for device fault monitoring, and if it attempts to do so then it will be slow and may misbehave in other ways. The software which communicates with the device (i.e. the device driver) is in the best position to monitor the device. The primary goal of ZFS is to be able to correctly read data which was successfully committed to disk. There are programming interfaces (e.g. fsync(), msync()) which may be used to ensure that data is committed to disk, and which should return an error if there is a problem. If you were performing your tests over an NFS mount then the results should be considerably different since NFS requests that its data be committed to disk. Bob -- - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 [EMAIL PROTECTED] - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
USED AVAILCAP HEALTH ALTROOTrc-pool 2.27T 52.6G 2.21T 2% DEGRADED -test - - - - FAULTED -# zpool status test pool: test state: UNAVAILstatus: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning.action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requestedconfig: NAMESTATE READ WRITE CKSUM testUNAVAIL 0 0 0 insufficient replicas c2t7d0UNAVAIL 0 0 0 cannot open -- At least re-activating the pool is simple, but gotta love the No known data errors line -- # cfgadm -c configure sata1/7# zpool status test pool: test state: ONLINE scrub: none requestedconfig: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 c2t7d0ONLINE 0 0 0 errors: No known data errors -- But of course, although ZFS thinks it's online, it didn't mount properly -- # cd /test# ls# zpool export test# rm -r /test# zpool import test# cd test# lsvar (copy) var2 -- Now that's unexpected. Those folders should be long gone. Let's see how many files ZFS failed to delete -- # du -h -s /test 77M /test# find /test | wc -l 19033 So in addition to working for a full half hour creating files, it's also failed to remove 77MB of data contained in nearly 20,000 files. And it's done all that without reporting any error or problem with the pool. In fact, if I didn't know what I was looking for, there would be no indication of a problem at all. Before the reboot I can't find what's going on as zfs status hangs. After the reboot it says there's no problem. Both ZFS and it's troubleshooting tools fail in a big way here. As others have said, zfs status should not hang. ZFS has to know the state of all the drives and pools it's currently using, zfs status should simply report the current known status from ZFS' internal state. It shouldn't need to scan anything. ZFS' internal state should also be checking with cfgadm so that it knows if a disk isn't there. It should also be updated if the cache can't be flushed to disk, and zfs list / zpool list needs to borrow state information from the status commands so that they don't say 'online' when the pool has problems. ZFS needs to deal more intelligently with mount points when a pool has problems. Leaving the folder lying around in a way that prevents the pool mounting properly when the drives are recovered is not good. When the pool appears to come back online without errors, it would be very easy for somebody to assume the data was lost from the pool without realising that it simply hasn't mounted and they're actually looking at an empty folder. Firstly ZFS should be removing the mount point when problems occur, and secondly, ZFS list or ZFS status should include information to inform you that the pool could not be mounted properly. ZFS status really should be warning of any ZFS errors that occur. Including things like being unable to mount the pool, CIFS mounts failing, etc... And finally, if ZFS does find problems writing from the cache, it really needs to log somewhere the names of all the files affected, and the action that could not be carried out. ZFS knows the files it was meant to delete here, it also knows the files that were written. I can accept that with delayed writes files may occasionally be lost when a failure happens, but I don't accept that we need to loose all knowledge of the affected files when the filesystem has complete knowledge of what is affected. If there are any working filesystems on the server, ZFS should make an attempt to store a log of the problem, failing that it should e-mail the data out. The admin really needs to know what files have been affected so that they can notify users of the data loss. I don't know where you would store this information, but wherever that is, zpool status should be reporting the error and directing the admin to the log file. I would probably say this could be safely stored on the system drive. Would it be possible to have a number of possible places to store this log? What I'm thinking is that if the system drive is unavailable, ZFS could try each pool in turn and attempt to store the log there. In fact e-mail alerts or external error logging would be a great addition to ZFS. Surely it makes sense that filesystem errors would be better off being stored and handled externally? Ross Date: Mon, 28 Jul 2008 12:28:34 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] I'm trying to reproduce and will let you know what I find. -- richard _ The John Lewis Clearance - save up to 50% with FREE delivery http://clk.atdmt.com/UKM/go/101719806
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
this information, but wherever that is, zpool status should be reporting the error and directing the admin to the log file. I would probably say this could be safely stored on the system drive. Would it be possible to have a number of possible places to store this log? What I'm thinking is that if the system drive is unavailable, ZFS could try each pool in turn and attempt to store the log there. In fact e-mail alerts or external error logging would be a great addition to ZFS. Surely it makes sense that filesystem errors would be better off being stored and handled externally? Ross Date: Mon, 28 Jul 2008 12:28:34 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] I'm trying to reproduce and will let you know what I find. -- richard Win £3000 to spend on whatever you want at Uni! Click here to WIN! http://clk.atdmt.com/UKM/go/101719803/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 [EMAIL PROTECTED] - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
where you would store this information, but wherever that is, zpool status should be reporting the error and directing the admin to the log file. I would probably say this could be safely stored on the system drive. Would it be possible to have a number of possible places to store this log? What I'm thinking is that if the system drive is unavailable, ZFS could try each pool in turn and attempt to store the log there. In fact e-mail alerts or external error logging would be a great addition to ZFS. Surely it makes sense that filesystem errors would be better off being stored and handled externally? Ross Date: Mon, 28 Jul 2008 12:28:34 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] I'm trying to reproduce and will let you know what I find. -- richard Win £3000 to spend on whatever you want at Uni! Click here to WIN! http://clk.atdmt.com/UKM/go/101719803/direct/01/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest [EMAIL PROTECTED] | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
4. While reading an offline disk causes errors, writing does not! *** CAUSES DATA LOSS *** This is a big one: ZFS can continue writing to an unavailable pool. It doesn't always generate errors (I've seen it copy over 100MB before erroring), and if not spotted, this *will* cause data loss after you reboot. I discovered this while testing how ZFS coped with the removal of a hot plug SATA drive. I knew that the ZFS admin tools were hanging, but that redundant pools remained available. I wanted to see whether it was just the ZFS admin tools that were failing, or whether ZFS was also failing to send appropriate error messages back to the OS. This is not unique for zfs. If you need to know that your writes has reached stable store you have to call fsync(). It is not enough to close a file. This is true even for UFS, but UFS won't delay writes for all operations so you will notice faster. But you will still loose data. I have been able to undo rm -rf / on a FreeBSD system by pulling the power cord before it wrote the changes... Databases use fsync (or similar) before they close a transaction, that one of the reasons that databases like hardware write caches. cp will not. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
On Mon, 28 Jul 2008, Ross wrote: TEST1: Opened File Browser, copied the test data to the pool. Half way through the copy I pulled the drive. THE COPY COMPLETED WITHOUT ERROR. Zpool list reports the pool as online, however zpool status hung as expected. Are you sure that this reference software you call File Browser actually responds to errors? Maybe it is typical Linux-derived software which does not check for or handle errors and ZFS is reporting errors all along while the program pretends to copy the lost files. If you were using Microsoft Windows, its file browser would probably report Unknown error: 666 but at least you would see an error dialog and you could visit the Microsoft knowledge base to learn that message ID 666 means Unknown error. The other possibility is that all of these files fit in the ZFS write cache so the error reporting is delayed. The Dtrace Toolkit provides a very useful DTrace script called 'errinfo' which will list every system call which reports and error. This is very useful and informative. If you run it, you will see every error reported to the application level. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
File Browser is the name of the program that Solaris opens when you open Computer on the desktop. It's the default graphical file manager. It does eventually stop copying with an error, but it takes a good long while for ZFS to throw up that error, and even when it does, the pool doesn't report any problems at all. Date: Mon, 28 Jul 2008 13:03:24 -0500 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed On Mon, 28 Jul 2008, Ross wrote: TEST1: Opened File Browser, copied the test data to the pool. Half way through the copy I pulled the drive. THE COPY COMPLETED WITHOUT ERROR. Zpool list reports the pool as online, however zpool status hung as expected. Are you sure that this reference software you call File Browser actually responds to errors? Maybe it is typical Linux-derived software which does not check for or handle errors and ZFS is reporting errors all along while the program pretends to copy the lost files. If you were using Microsoft Windows, its file browser would probably report Unknown error: 666 but at least you would see an error dialog and you could visit the Microsoft knowledge base to learn that message ID 666 means Unknown error. The other possibility is that all of these files fit in the ZFS write cache so the error reporting is delayed. The Dtrace Toolkit provides a very useful DTrace script called 'errinfo' which will list every system call which reports and error. This is very useful and informative. If you run it, you will see every error reported to the application level. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _ Invite your Facebook friends to chat on Messenger http://clk.atdmt.com/UKM/go/101719649/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
snv_91. I downloaded snv_94 today so I'll be testing with that tomorrow. Date: Mon, 28 Jul 2008 09:58:43 -0700 From: [EMAIL PROTECTED] Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL PROTECTED] Which OS and revision? -- richard Ross wrote: Ok, after doing a lot more testing of this I've found it's not the Supermicro controller causing problems. It's purely ZFS, and it causes some major problems! I've even found one scenario that appears to cause huge data loss without any warning from ZFS - up to 30,000 files and 100MB of data missing after a reboot, with zfs reporting that the pool is OK. *** 1. Solaris handles USB and SATA hot plug fine If disks are not in use by ZFS, you can unplug USB or SATA devices, cfgadm will recognise the disconnection. USB devices are recognised automatically as you reconnect them, SATA devices need reconfiguring. Cfgadm even recognises the SATA device as an empty bay: # cfgadm Ap_Id Type Receptacle Occupant Condition sata1/7 sata-port empty unconfigured ok usb1/3 unknown empty unconfigured ok -- insert devices -- # cfgadm Ap_Id Type Receptacle Occupant Condition sata1/7 disk connected unconfigured unknown usb1/3 usb-storage connected configured ok To bring the sata drive online it's just a case of running # cfgadm -c configure sata1/7 *** 2. If ZFS is using a hot plug device, disconnecting it will hang all ZFS status tools. While pools remain accessible, any attempt to run zpool status will hang. I don't know if there is any way to recover these tools once this happens. While this is a pretty big problem in itself, it also makes me worry if other types of error could have the same effect. I see potential for this leaving a server in a state whereby you know there are errors in a pool, but have no way of finding out what those errors might be without rebooting the server. *** 3. Once ZFS status tools are hung the computer will not shut down. The only way I've found to recover from this is to physically power down the server. The solaris shutdown process simply hangs. *** 4. While reading an offline disk causes errors, writing does not! *** CAUSES DATA LOSS *** This is a big one: ZFS can continue writing to an unavailable pool. It doesn't always generate errors (I've seen it copy over 100MB before erroring), and if not spotted, this *will* cause data loss after you reboot. I discovered this while testing how ZFS coped with the removal of a hot plug SATA drive. I knew that the ZFS admin tools were hanging, but that redundant pools remained available. I wanted to see whether it was just the ZFS admin tools that were failing, or whether ZFS was also failing to send appropriate error messages back to the OS. These are the tests I carried out: Zpool: Single drive zpool, consisting of one 250GB SATA drive in a hot plug bay. Test data: A folder tree containing 19,160 items. 71.1MB in total. TEST1: Opened File Browser, copied the test data to the pool. Half way through the copy I pulled the drive. THE COPY COMPLETED WITHOUT ERROR. Zpool list reports the pool as online, however zpool status hung as expected. Not quite believing the results, I rebooted and tried again. TEST2: Opened File Browser, copied the data to the pool. Pulled the drive half way through. The copy again finished without error. Checking the properties shows 19,160 files in the copy. ZFS list again shows the filesystem as ONLINE. Now I decided to see how many files I could copy before it errored. I started the copy again. File Browser managed a further 9,171 files before it stopped. That's nearly 30,000 files before any error was detected. Again, despite the copy having finally errored, zpool list shows the pool as online, even though zpool status hangs. I rebooted the server, and found that after the reboot my first copy contains just 10,952 items, and my second copy is completely missing. That's a loss of almost 20,000 files. Zpool status however reports NO ERRORS. For the third test I decided to see if these files are actually accessible before the reboot: TEST3: This time I pulled the drive *before* starting the copy. The copy started much slower this time and only got to 2,939 files before reporting an error. At this point I copied all the files that had been copied to another pool, and then rebooted. After the reboot, the folder in the test pool had disappeared completely, but the copy I took before rebooting was fine and contains 2,938 items, approximately 12MB of data. Again, zpool status reports no errors
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
mp == Mattias Pantzare [EMAIL PROTECTED] writes: This is a big one: ZFS can continue writing to an unavailable pool. It doesn't always generate errors (I've seen it copy over 100MB before erroring), and if not spotted, this *will* cause data loss after you reboot. mp This is not unique for zfs. If you need to know that your mp writes has reached stable store you have to call fsync(). seconded. How about this: * start the copy * pull the disk, without waiting for an error reported to the application * type 'lockfs -fa'. Does either lockfs hang, or you get an immediate error after requesting the lockfs? If so, I think it's ok and within the unix tradition to allow all these writes, it's just maybe a more extreme version of the tradition, which might not be an entirely bad compromise if ZFS can keep up this behavior, and actually retry the unreported failed writes, when confronted with FC, iSCSI, USB, FW targets that bounce. I'm not sure if it can ever do that yet or not, but architecturally I wouldn't want to demand that it return failure to the app too soon, so long as fsync() still behaves correctly w.r.t. power failures. However the other problems you report are things I've run into, also. 'zpool status' should not be touching the disk at all. so, we have: * 'zpool list' shows ONLINE several minutes after a drive is yanked. At the time 'zpool list' still shows ONLINE, 'zpool status' doesn't show anything at all because it hangs, so ONLINE seems too positive a report for the situation. I'd suggest: + 'zpool list' should not borrow the ONLINE terminology from 'zpool status' if the list command means something different by the word ONLINE. maybe SEEMS_TO_BE_AROUND_SOMEWHERE is more appropriate. + during this problem, 'zpool list' is available while 'zpool status' is not working. Fine, maybe, during a failure, not all status tools will be available. However it would be nice if, as a minimum, some status tool capable of reporting ``pool X is failing'' were available. In the absence of that, you may have to reboot the machine without ever knowing even which pool failed to bring it down. * maybe sometimes certain types of status and statistics aren't available, but no status-reporting tools should ever be subject to blocking inside the kernel. At worst they should refuse to give information, and return to a prompt, immediately. I'm in the habit of typing 'zpool status ' during serious problems so I don't lose control of the console. * 'zpool status' is used when things are failing. Cabling and driver state machines are among the failures from which a volume manager should protect us---that's why we say ``buy redundant controllers if possible.'' In this scenario, a read is an intrusive act, because it could provoke a problem. so even if 'zpool status' is only reading, not writing to disk nor to data structures inside the kernel, it is still not really a status tool. It's an invasive poking/pinging/restarting/breaking tool. Such tools should be segregated, and shouldn't substitute for the requirement to have true status tools that only read data structures kept in the kernel, not update kernel structures and not touch disks. This would be like if 'ps' made an implicit call to rcapd, or activated some swapping thread, or something like that. ``My machine is sluggish. I wonder what's slowing it down. ...'ps'... oh, shit, now it's not responding at all, and I'll never know why.'' There can be other tools, too, but I think LVM2 and SVM both have carefully non-invasive status tools, don't they? This principle should be followed everywhere. For example, 'iscsiadm list discovery-address' should simply list the discovery addresses. It should not implicitly attempt to contact each discovery address in its list, while I wait. -8- terabithia:/# time iscsiadm list discovery-address Discovery Address: 10.100.100.135:3260 Discovery Address: 10.100.100.138:3260 real0m45.935s user0m0.006s sys 0m0.019s terabithia:/# jobs [1]+ Running zpool status terabithia:/# -8- now, if you're really scalable, try the above again with 100 iSCSI targets and 20 pools. A single 'iscsiadm list discovery-address' command, even if it's sort-of ``working'', can take hours to complete. This does not happen on Linux where I configure through text files and inspect status through 'cat /proc/...' In other words, it's not just that the information 'zpool status' gives is inaccurate. It's not just that some information is hidden (like how sometimes a device listed as ONLINE will say ``no valid replicas'' when you try to offline it, and sometimes it won't, and the only way to tell the difference is to attempt to offline the device---so trying to 'zpool offline' each device in turn
[zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Has anybody here got any thoughts on how to resolve this problem: http://www.opensolaris.org/jive/thread.jspa?messageID=261204tstart=0 It sounds like two of us have been affected by this now, and it's a bit of a nuisance your entire server hanging when a drive is removed, makes you worry about how Solaris would handle a drive failure. Has anybody tried pulling a drive on a live Thumper, surely they don't hang like this? Although, having said that I do remember they do have a great big warning in the manual about using cfgadm to stop the disk before removal saying: Caution - You must follow these steps before removing a disk from service. Failure to follow the procedure can corrupt your data or render your file system inoperable. Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
I've discovered this as well - b81 to b93 (latest I've tried). I switched from my on-board SATA controller to AOC-SAT2-MV8 cards because the MCP55 controller caused random disk hangs. Now the SAT2-MV8 works as long as the drives are working correctly, but the system can't handle a drive failure or disconnect. :( I don't think there's a bug filed for it. That would probably be the first step to getting this resolved (might also post to storage-discuss). -- Dave Ross wrote: Has anybody here got any thoughts on how to resolve this problem: http://www.opensolaris.org/jive/thread.jspa?messageID=261204tstart=0 It sounds like two of us have been affected by this now, and it's a bit of a nuisance your entire server hanging when a drive is removed, makes you worry about how Solaris would handle a drive failure. Has anybody tried pulling a drive on a live Thumper, surely they don't hang like this? Although, having said that I do remember they do have a great big warning in the manual about using cfgadm to stop the disk before removal saying: Caution - You must follow these steps before removing a disk from service. Failure to follow the procedure can corrupt your data or render your file system inoperable. Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
Yeah, I thought of the storage forum today and found somebody else with the problem, and since my post a couple of people have reported similar issues on Thumpers. I guess the storage thread is the best place for this now: http://www.opensolaris.org/jive/thread.jspa?threadID=42507tstart=0 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss