Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-04 Thread Miles Nordin
 re == Richard Elling [EMAIL PROTECTED] writes:
 pf == Paul Fisher [EMAIL PROTECTED] writes:

re I was able to reproduce this in b93, but might have a
re different interpretation

You weren't able to reproduce the hang of 'zpool status'?
Your 'zpool status' was after the FMA fault kicked in, though.  How
about before FMA decided to mark the pool faulted---did 'zpool status'
hang, or work?  If it worked, what did it report?

The 'zpool status' hanging happens for me on b71 when an iSCSI target
goes away.  (IIRC 'iscsiadm remove discovery-address ...' unwedges
zpool status for me, but my notes could be more careful.)

re However, the default failmode property is set to wait which
re will patiently wait forever.  If you would rather have the I/O
re fail, then you should change the failmode to continue

for him, it sounds like it's not doing either.  I think he does not
have the failmode property, since it is so new?

It sounds like 'continue' should return I/O errors sooner than 9
minutes after the unredundant disks generate them (but not at all for
degraded redundant pools of course).  And it sounds like 'wait' should
block the writing program, forever if necessary, like an NFS hard
mount.

  (1) Is the latter what 'wait' actually did for you?  Or did the
  writing process get I/O errors after the 9-minutes-later FMA
  diagnosis?

  (2) is it like NFS 'hard' or is it like 'hard,intr'? :)

It's great to see these things improving.

pf Wow! Who knew that 17, 951 was the magic number...  Seriously,
pf this does seem like an excessive amount of certainty.

I agree it's an awfully forgiving constant, so big that it sounds like
it might not be a constant manually set to 16384 or something, but
rather an accident.  I'm surprised to find FMA is responsible for
deciding the length of this 9-minute (or more, for Ross) delay.

note that, if the false positives one is trying to filter out are
things like USB/SAN cabling spasms and drive recalibrations, the right
metric is time, not number of failed CDB's.

The hugely-delayed response may be a blessing in disguise though,
because arranging for the differnet FMA states to each last tens of
minutes means it's possible to evaluate the system's behavior in each
state, to see if it's correct.  For example, within this 9-minute
window:

 * what does 'zpool status' say before the FMA faulting

 * what do applications experience, ex., 

   + is it possible to get an I/O error during this window with failmode=wait?  
how about with failmode=continue?  

   + are reads and writes that block interruptible or uninterruptible?

   + What about fsync()?  

 o what about fsync() if there is a slog?

 * is the system stable or are there ``lazy panic'' cases?

   + what if you ``ask for it'' by calling 'zpool clear' or 'zpool
 scrub' within the 9-minute window?

 * are other pools that don't include failed devices affected (for
   reading/writing.  but, also, if 'zpool status' is frozen for all
   pools, then other pools are affected.)

 * probably other stuff...

God willing some day some of the states can be shortened to values
more like 1 second or 1 minute, or really aggressive
variance-and-average-based threshholds like TCP timers, so that FMA is
actually useful rather than a step backwards from SVM as it seems to
me right now.  The NetApp paper Richard posted earlier was saying
NetApp never waits the 30 seconds for an ATAPI error, they just ignore
the disk if it doesn't answer within 1000ms or so.  But my crappy
Linux iSCSI targets would probably miss 1000ms timeouts all the time
just because they're heavily loaded---you could get pools that go
FAULTED whenever they get heavy use.

so some of FMA's states maybe should be short, but they're harder to
observe when they're so short. The point of FMA, AIUI, is to make the
failure state machine really complicated.  We want it complicated to
deal with both netapp's good example of aggressive timers and also
deal with my crappy Linux IET setup, so increasingly hairy rules can
be written with experience.  Complicated means that observing each
state is important to verify the complicated system's correctness.
And observing means they can't be 1 second long even if that's the
appropriate length.  But I don't know if that's really the developer's
intent, or just my dreaming and hoping.


pgpAT0ZOB5awi.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-01 Thread Andrew Hisgen
Question embedded below...

Richard Elling wrote:
...
 If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll
 see words to the effect that,
 The pool has experienced I/O failures. Since the ZFS pool property
   'failmode' is set to 'wait', all I/Os (reads and writes) are
   blocked. See the zpool(1M) manpage for more information on the
   'failmode' property. Manual intervention is required for I/Os to
   be serviced.
 
  
 I would guess that ZFS is attempting to write to the disk in the 
 background, and that this is silently failing.
 
 It is clearly not silently failing.
 
 However, the default failmode property is set to wait which will patiently
 wait forever.  If you would rather have the I/O fail, then you should change
 the failmode to continue  I would not normally recommend a failmode of
 panic

Hi Richard,

Does failmode==wait cause ZFS itself to retry i/o, that is, to retry an
i/o where an earlier request (of that same i/o) returned from the driver
with an error?  If so, that will compound timeouts even further.

I'm also confused by your statement that wait means wait forever, given
that the actual circumstances here are that zfs (and the rest of the
i/o stack) returned after 9 minutes.

thanks,
Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-08-01 Thread Richard Elling
Hi Andy, answer  pointer below...

Andrew Hisgen wrote:
 Question embedded below...

 Richard Elling wrote:
 ...
 If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll
 see words to the effect that,
 The pool has experienced I/O failures. Since the ZFS pool property
   'failmode' is set to 'wait', all I/Os (reads and writes) are
   blocked. See the zpool(1M) manpage for more information on the
   'failmode' property. Manual intervention is required for I/Os to
   be serviced.

  
 I would guess that ZFS is attempting to write to the disk in the 
 background, and that this is silently failing.

 It is clearly not silently failing.

 However, the default failmode property is set to wait which will 
 patiently
 wait forever.  If you would rather have the I/O fail, then you should 
 change
 the failmode to continue  I would not normally recommend a failmode of
 panic

 Hi Richard,

 Does failmode==wait cause ZFS itself to retry i/o, that is, to retry an
 i/o where an earlier request (of that same i/o) returned from the driver
 with an error?  If so, that will compound timeouts even further.

 I'm also confused by your statement that wait means wait forever, given
 that the actual circumstances here are that zfs (and the rest of the
 i/o stack) returned after 9 minutes.

The details are in PSARC/2007/567.  Externally available at:
http://www.opensolaris.org/os/community/arc/caselog/2007/567/

With failmode=wait, I/Os will wait until manual intervention which
is shown as an administrator running zpool clear on the affected pool.

I see the need for a document to help people work through these
cases as they can be complex at many different levels.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-31 Thread Ross Smith
 0K   
  0K 0%/dev/fdswap   4.7G48K   4.7G 1%
/tmpswap   4.7G76K   4.7G 1%
/var/run/dev/dsk/c1t0d0s7  425G   4.8G   416G 2%/export/home
 
6. 10:35am  It's now been two hours, neither zpool status nor zfs list have 
ever finished.  The file copy attempt has also been hung for over an hour 
(although that's not unexpected with 'wait' as the failmode).
 
Richard, you say ZFS is not silently failing, well for me it appears that it 
is.  I can't see any warnings from ZFS, I can't get any status information.  I 
see no way that I could find out what files are going to be lost on this server.
 
Yes, I'm now aware that the pool has hung since file operations are hanging, 
however had that been my first indication of a problem I believe I am now left 
in a position where I cannot find out either the cause, nor the files affected. 
 I don't believe I have any way to find out which operations had completed 
without error, but are not currently committed to disk.  I certainly don't get 
the status message you do saying permanent errors have been found in files.
 
I plugged the USB drive back in now, Solaris detected it ok, but ZFS is still 
hung.  The rest of /var/adm/messages is:
Jul 31 09:39:44 unknown smbd[603]: [ID 766186 daemon.error] 
NbtDatagramDecode[11]: too small packetJul 31 09:45:22 unknown 
/sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on 
nge0 missing mandatory lease option, ignoredJul 31 09:45:38 unknown last 
message repeated 5 timesJul 31 09:51:44 unknown smbd[603]: [ID 766186 
daemon.error] NbtDatagramDecode[11]: too small packetJul 31 10:03:44 unknown 
last message repeated 2 timesJul 31 10:14:27 unknown /sbin/dhcpagent[95]: [ID 
732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory 
lease option, ignoredJul 31 10:14:45 unknown last message repeated 5 timesJul 
31 10:15:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: 
too small packetJul 31 10:27:45 unknown smbd[603]: [ID 766186 daemon.error] 
NbtDatagramDecode[11]: too small packet
Jul 31 10:36:25 unknown usba: [ID 691482 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci15d9,[EMAIL PROTECTED],1/[EMAIL PROTECTED] (scsa2usb0): 
Reinserted device is accessible again.Jul 31 10:39:45 unknown smbd[603]: [ID 
766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 10:45:53 
unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK 
packet on nge0 missing mandatory lease option, ignoredJul 31 10:46:09 unknown 
last message repeated 5 timesJul 31 10:51:45 unknown smbd[603]: [ID 766186 
daemon.error] NbtDatagramDecode[11]: too small packet
 
7. 10:55am  Gave up on ZFS ever recovering.  A shutdown attempt hung as 
expected.  I hard-reset the computer.
 
Ross
 
 
 Date: Wed, 30 Jul 2008 11:17:08 -0700 From: [EMAIL PROTECTED] Subject: Re: 
 [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL 
 PROTECTED] CC: zfs-discuss@opensolaris.org  I was able to reproduce this 
 in b93, but might have a different interpretation of the conditions. More 
 below...  Ross Smith wrote:  A little more information today. I had a 
 feeling that ZFS would   continue quite some time before giving an error, 
 and today I've shown   that you can carry on working with the filesystem 
 for at least half an   hour with the disk removed.I suspect on a 
 system with little load you could carry on working for   several hours 
 without any indication that there is a problem. It   looks to me like ZFS 
 is caching reads  writes, and that provided   requests can be fulfilled 
 from the cache, it doesn't care whether the   disk is present or not.  In 
 my USB-flash-disk-sudden-removal-while-writing-big-file-test, 1. I/O to the 
 missing device stopped (as I expected) 2. FMA kicked in, as expected. 3. 
 /var/adm/messages recorded Command failed to complete... device gone. 4. 
 After exactly 9 minutes, 17,951 e-reports had been processed and the 
 diagnosis was complete. FMA logged the following to /var/adm/messages  Jul 
 30 10:33:44 grond scsi: [ID 107833 kern.warning] WARNING:  /[EMAIL 
 PROTECTED],0/pci1458,[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd1): Jul 30 10:33:44 grond Command failed to 
 complete...Device is gone Jul 30 10:42:31 grond fmd: [ID 441519 
 daemon.error] SUNW-MSG-ID:  ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: 
 Major Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT 2008 Jul 30 
 10:42:31 grond PLATFORM: , CSN: , HOSTNAME: grond Jul 30 10:42:31 grond 
 SOURCE: zfs-diagnosis, REV: 1.0 Jul 30 10:42:31 grond EVENT-ID: 
 d99769aa-28e8-cf16-d181-945592130525 Jul 30 10:42:31 grond DESC: The number 
 of I/O errors associated with a  ZFS device exceeded Jul 30 10:42:31 grond 
 acceptable levels. Refer to  http://sun.com/msg/ZFS-8000-FD for more 
 information. Jul 30 10:42:31 grond AUTO-RESPONSE: The device has been 
 offlined and  marked

Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Ross
Well yeah, this is obviously not a valid setup for my data, but if you read my 
first e-mail, the whole point of this test was that I had seen Solaris hang 
when a drive was removed from a fully redundant array (five sets of three way 
mirrors), and wanted to see what was going on.

So I started with the most basic pool I could to see how ZFS and Solaris 
actually reacted to a drive being removed.  I was fully expecting ZFS to simply 
error when the drive was removed, and move the test on to move complex pools.  
I did not expect to find so many problems with such a simple setup.  And the 
problems I have found also lead to potential data loss in a redundant array, 
although it would have been much more difficult to spot:

Imagine you had a raid-z array and pulled a drive as I'm doing here.  Because 
ZFS isn't aware of the removal it keeps writing to that drive as if it's valid. 
 That means ZFS still believes the array is online when in fact it should be 
degrated.  If any other drive now fails, ZFS will consider the status degrated 
instead of faulted, and will continue writing data.  The problem is, ZFS is 
writing some of that data to a drive which doesn't exist, meaning all that data 
will be lost on reboot.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Bob Friesenhahn
On Wed, 30 Jul 2008, Ross wrote:

 Imagine you had a raid-z array and pulled a drive as I'm doing here. 
 Because ZFS isn't aware of the removal it keeps writing to that 
 drive as if it's valid.  That means ZFS still believes the array is 
 online when in fact it should be degrated.  If any other drive now 
 fails, ZFS will consider the status degrated instead of faulted, and 
 will continue writing data.  The problem is, ZFS is writing some of 
 that data to a drive which doesn't exist, meaning all that data will 
 be lost on reboot.

While I do believe that device drivers. or the fault system, should 
notify ZFS when a device fails (and ZFS should appropriately react), I 
don't think that ZFS should be responsible for fault monitoring.  ZFS 
is in a rather poor position for device fault monitoring, and if it 
attempts to do so then it will be slow and may misbehave in other 
ways.  The software which communicates with the device (i.e. the 
device driver) is in the best position to monitor the device.

The primary goal of ZFS is to be able to correctly read data which was 
successfully committed to disk.  There are programming interfaces 
(e.g. fsync(), msync()) which may be used to ensure that data is 
committed to disk, and which should return an error if there is a 
problem.  If you were performing your tests over an NFS mount then the 
results should be considerably different since NFS requests that its 
data be committed to disk.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Ross Smith

I agree that device drivers should perform the bulk of the fault monitoring, 
however I disagree that this absolves ZFS of any responsibility for checking 
for errors.  The primary goal of ZFS is to be a filesystem and maintain data 
integrity, and that entails both reading and writing data to the devices.  It 
is no good having checksumming when reading data if you are loosing huge 
amounts of data when a disk fails.
 
I'm not saying that ZFS should be monitoring disks and drivers to ensure they 
are working, just that if ZFS attempts to write data and doesn't get the 
response it's expecting, an error should be logged against the device 
regardless of what the driver says.  If ZFS is really about end-to-end data 
integrity, then you do need to consider the possibility of a faulty driver.  
Now I don't know what the root cause of this error is, but I suspect it will be 
either a bad response from the SATA driver, or something within ZFS that is not 
working correctly.  Either way however I believe ZFS should have caught this.
 
It's similar to the iSCSI problem I posted a few months back where the ZFS pool 
hangs for 3 minutes when a device is disconnected.  There's absolutely no need 
for the entire pool to hang when the other half of the mirror is working fine.  
ZFS is often compared to hardware raid controllers, but so far it's ability to 
handle problems is falling short.
 
Ross
 
 Date: Wed, 30 Jul 2008 09:48:34 -0500 From: [EMAIL PROTECTED] To: [EMAIL 
 PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] 
 Supermicro AOC-SAT2-MV8 hang when drive removed  On Wed, 30 Jul 2008, Ross 
 wrote:   Imagine you had a raid-z array and pulled a drive as I'm doing 
 here.   Because ZFS isn't aware of the removal it keeps writing to that   
 drive as if it's valid. That means ZFS still believes the array is   online 
 when in fact it should be degrated. If any other drive now   fails, ZFS 
 will consider the status degrated instead of faulted, and   will continue 
 writing data. The problem is, ZFS is writing some of   that data to a drive 
 which doesn't exist, meaning all that data will   be lost on reboot.  
 While I do believe that device drivers. or the fault system, should  notify 
 ZFS when a device fails (and ZFS should appropriately react), I  don't think 
 that ZFS should be responsible for fault monitoring. ZFS  is in a rather 
 poor position for device fault monitoring, and if it  attempts to do so then 
 it will be slow and may misbehave in other  ways. The software which 
 communicates with the device (i.e. the  device driver) is in the best 
 position to monitor the device.  The primary goal of ZFS is to be able to 
 correctly read data which was  successfully committed to disk. There are 
 programming interfaces  (e.g. fsync(), msync()) which may be used to ensure 
 that data is  committed to disk, and which should return an error if there 
 is a  problem. If you were performing your tests over an NFS mount then the 
  results should be considerably different since NFS requests that its  data 
 be committed to disk.  Bob == Bob 
 Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ 
 GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ 
_
Find the best and worst places on the planet
http://clk.atdmt.com/UKM/go/101719807/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Bob Friesenhahn
On Wed, 30 Jul 2008, Ross Smith wrote:

 I'm not saying that ZFS should be monitoring disks and drivers to 
 ensure they are working, just that if ZFS attempts to write data and 
 doesn't get the response it's expecting, an error should be logged 
 against the device regardless of what the driver says.  If ZFS is

A few things to consider:

  * Maybe the device driver has not yet reported (or fails to report)
and error and just seems slow.

  * ZFS is at such a high level that in many cases it has no useful
knowledge of actual devices.  For example, MPXIO (multipath) may be
layered on top, or maybe an ethernet network is involved.

If ZFS experiences a temporary problem with reaching a device, does 
that mean the device has failed, or does it perhaps indicate that a 
path is temporarily slow?

If one device is a local disk and the other device is accessed via 
iSCSI and is located on the other end of the country, should ZFS 
refuse to operate if the remote disk is slow or stops responding for 
several minutes?  This would be a typical situation when using 
mirroring, and one mirror device is remote.

The parameters that a device driver for a local device uses to decide 
if there is a fault will be (and should be) substantially different 
than the parameters for a remote device.  That is why most 
responsibility is left to the device driver.  ZFS will behave 
according to how the device driver behaves.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Peter Cudhea
Your point is well taken that ZFS should not duplicate functionality 
that is already or should be available at the device driver level.In 
this case, I think it misses the point of what ZFS should be doing that 
it is not.

ZFS does its own periodic commits to the disk, and it knows if those 
commit points have reached the disk or not, or whether they are getting 
errors.In this particular case, those commits to disk are presumably 
failing, because one of the disks they depend on has been removed from 
the system.   (If the writes are not being marked as failures, that 
would definitely be an error in the device driver, as you say.)  In this 
case, however, the ZIL log has stopped being updated, but ZFS does 
nothing to announce that this has happened, or to indicate that a remedy 
is required.

At the very least, it would be extremely helpful if  ZFS had a status to 
report that indicates that the ZIL log is out of date, or that there are 
troubles writing to the ZIL log, or something like that.

An additional feature would be to have user-selectable behavior when the 
ZIL log is significantly out of date.For example, if the ZIL log is 
more than X seconds out of date, then new writes to the system should 
pause, or give errors or continue to silently succeed.

In an earlier phase of my career when I worked for a database company, I 
was responsible for a similar bug.   It caused a major customer to lose 
a major amount of data when a system rebooted when not all good data had 
been successfully committed to disk.The resulting stink caused us to 
add a feature to detect the cases when the writing-to-disk process had 
fallen too far behind, and to pause new writes to the database until the 
situation was resolved.

Peter

Bob Friesenhahn wrote:
 While I do believe that device drivers. or the fault system, should 
 notify ZFS when a device fails (and ZFS should appropriately react), I 
 don't think that ZFS should be responsible for fault monitoring.  ZFS 
 is in a rather poor position for device fault monitoring, and if it 
 attempts to do so then it will be slow and may misbehave in other 
 ways.  The software which communicates with the device (i.e. the 
 device driver) is in the best position to monitor the device.

 The primary goal of ZFS is to be able to correctly read data which was 
 successfully committed to disk.  There are programming interfaces 
 (e.g. fsync(), msync()) which may be used to ensure that data is 
 committed to disk, and which should return an error if there is a 
 problem.  If you were performing your tests over an NFS mount then the 
 results should be considerably different since NFS requests that its 
 data be committed to disk.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Richard Elling
I was able to reproduce this in b93, but might have a different
interpretation of the conditions.  More below...

Ross Smith wrote:
 A little more information today.  I had a feeling that ZFS would 
 continue quite some time before giving an error, and today I've shown 
 that you can carry on working with the filesystem for at least half an 
 hour with the disk removed.
  
 I suspect on a system with little load you could carry on working for 
 several hours without any indication that there is a problem.  It 
 looks to me like ZFS is caching reads  writes, and that provided 
 requests can be fulfilled from the cache, it doesn't care whether the 
 disk is present or not.

In my USB-flash-disk-sudden-removal-while-writing-big-file-test,
1. I/O to the missing device stopped (as I expected)
2. FMA kicked in, as expected.
3. /var/adm/messages recorded Command failed to complete... device gone.
4. After exactly 9 minutes, 17,951 e-reports had been processed and the
diagnosis was complete.  FMA logged the following to /var/adm/messages

  Jul 30 10:33:44 grond scsi: [ID 107833 kern.warning] WARNING:   
/[EMAIL PROTECTED],0/pci1458,[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd1):
  Jul 30 10:33:44 grond Command failed to complete...Device is gone
  Jul 30 10:42:31 grond fmd: [ID 441519 daemon.error] SUNW-MSG-ID: 
ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
  Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT 2008
  Jul 30 10:42:31 grond PLATFORM:  , CSN:  , HOSTNAME: grond
  Jul 30 10:42:31 grond SOURCE: zfs-diagnosis, REV: 1.0
  Jul 30 10:42:31 grond EVENT-ID: d99769aa-28e8-cf16-d181-945592130525
  Jul 30 10:42:31 grond DESC: The number of I/O errors associated with a 
ZFS device exceeded
  Jul 30 10:42:31 grond  acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
  Jul 30 10:42:31 grond AUTO-RESPONSE: The device has been offlined and 
marked as faulted.  An attempt
  Jul 30 10:42:31 grond  will be made to activate a hot spare if 
available.
  Jul 30 10:42:31 grond IMPACT: Fault tolerance of the pool may be 
compromised.
  Jul 30 10:42:31 grond REC-ACTION: Run 'zpool status -x' and replace 
the bad device.

The above URL shows what you expect, but more (and better) info
is available from zpool status -xv

pool: rmtestpool
   state: UNAVAIL
  status: One or more devices are faultd in response to IO failures.
  action: Make sure the affected devices are connected, then run 'zpool 
clear'.
 see: http://www.sun.com/msg/ZFS-8000-HC
   scrub: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  rmtestpool  UNAVAIL  0 15.7K 0  insufficient replicas
c2t0d0p0  FAULTED  0 15.7K 0  experienced I/O failures

  errors: Permanent errors have been detected in the following files:
 
  /rmtestpool/random.data


If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll
see words to the effect that,
The pool has experienced I/O failures. Since the ZFS pool property
  'failmode' is set to 'wait', all I/Os (reads and writes) are
  blocked. See the zpool(1M) manpage for more information on the
  'failmode' property. Manual intervention is required for I/Os to
  be serviced.

  
 I would guess that ZFS is attempting to write to the disk in the 
 background, and that this is silently failing.

It is clearly not silently failing.

However, the default failmode property is set to wait which will patiently
wait forever.  If you would rather have the I/O fail, then you should change
the failmode to continue  I would not normally recommend a failmode of
panic

Now to figure out how to recover gracefully... zpool clear isn't happy...

[sidebar]
while performing this experiment, I noticed that fmd was checkpointing
the diagnosis engine to disk in the /var/fm/fmd/ckpt/zfs-diagnosis 
directory.
If this had been the boot disk, with failmode=wait, I'm not convinced
that we'd get a complete diagnosis... I'll explore that later.
[/sidebar]

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Paul Fisher
Richard Elling wrote:
 I was able to reproduce this in b93, but might have a different
 interpretation of the conditions.  More below...

 Ross Smith wrote:
   
 A little more information today.  I had a feeling that ZFS would
 continue quite some time before giving an error, and today I've shown
 that you can carry on working with the filesystem for at least half an
 hour with the disk removed.

 I suspect on a system with little load you could carry on working for
 several hours without any indication that there is a problem.  It
 looks to me like ZFS is caching reads  writes, and that provided
 requests can be fulfilled from the cache, it doesn't care whether the
 disk is present or not.
 

 In my USB-flash-disk-sudden-removal-while-writing-big-file-test,
 1. I/O to the missing device stopped (as I expected)
 2. FMA kicked in, as expected.
 3. /var/adm/messages recorded Command failed to complete... device gone.
 4. After exactly 9 minutes, 17,951 e-reports had been processed and the
 diagnosis was complete.  FMA logged the following to /var/adm/messages
   
Wow! Who knew that 17, 951 was the magic number...  Seriously, this does 
seem like an excessive amount of certainty.


--
paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Neil Perrin


Peter Cudhea wrote:
 Your point is well taken that ZFS should not duplicate functionality 
 that is already or should be available at the device driver level.In 
 this case, I think it misses the point of what ZFS should be doing that 
 it is not.
 
 ZFS does its own periodic commits to the disk, and it knows if those 
 commit points have reached the disk or not, or whether they are getting 
 errors.In this particular case, those commits to disk are presumably 
 failing, because one of the disks they depend on has been removed from 
 the system.   (If the writes are not being marked as failures, that 
 would definitely be an error in the device driver, as you say.)  In this 
 case, however, the ZIL log has stopped being updated, but ZFS does 
 nothing to announce that this has happened, or to indicate that a remedy 
 is required.

I think you have some misconceptions about how the ZIL works.
It doesn't provide journalling like UFS. The following might help:

http://blogs.sun.com/perrin/entry/the_lumberjack

The ZIL isn't used at all unless there's fsync/O_DSYNC activity.

 
 At the very least, it would be extremely helpful if  ZFS had a status to 
 report that indicates that the ZIL log is out of date, or that there are 
 troubles writing to the ZIL log, or something like that.

If the ZIL cannot be written then we force a transaction group (txg)
commit. That is the only recourse to force data to stable storage before
returning to the application. 

 
 An additional feature would be to have user-selectable behavior when the 
 ZIL log is significantly out of date.For example, if the ZIL log is 
 more than X seconds out of date, then new writes to the system should 
 pause, or give errors or continue to silently succeed.

Again this doesn't make sense given how the ZIL works.

 
 In an earlier phase of my career when I worked for a database company, I 
 was responsible for a similar bug.   It caused a major customer to lose 
 a major amount of data when a system rebooted when not all good data had 
 been successfully committed to disk.The resulting stink caused us to 
 add a feature to detect the cases when the writing-to-disk process had 
 fallen too far behind, and to pause new writes to the database until the 
 situation was resolved.
 
 Peter
 
 Bob Friesenhahn wrote:
 While I do believe that device drivers. or the fault system, should 
 notify ZFS when a device fails (and ZFS should appropriately react), I 
 don't think that ZFS should be responsible for fault monitoring.  ZFS 
 is in a rather poor position for device fault monitoring, and if it 
 attempts to do so then it will be slow and may misbehave in other 
 ways.  The software which communicates with the device (i.e. the 
 device driver) is in the best position to monitor the device.

 The primary goal of ZFS is to be able to correctly read data which was 
 successfully committed to disk.  There are programming interfaces 
 (e.g. fsync(), msync()) which may be used to ensure that data is 
 committed to disk, and which should return an error if there is a 
 problem.  If you were performing your tests over an NFS mount then the 
 results should be considerably different since NFS requests that its 
 data be committed to disk.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Peter Cudhea
Thanks, this is helpful. I was definitely misunderstanding the part that
the ZIL plays in ZFS.

I found Richard Elling's discussion of the FMA response to the failure
very informative.   I see how the device driver, the fault analysis
layer and the ZFS layer are all working together.Though the
customer's complaint that the change in state from working to not
working is taking too long seems pretty valid.

Peter

Neil Perrin wrote:


 Peter Cudhea wrote:
 Your point is well taken that ZFS should not duplicate functionality 
 that is already or should be available at the device driver level.
 In this case, I think it misses the point of what ZFS should be doing 
 that it is not.

 ZFS does its own periodic commits to the disk, and it knows if those 
 commit points have reached the disk or not, or whether they are 
 getting errors.In this particular case, those commits to disk are 
 presumably failing, because one of the disks they depend on has been 
 removed from the system.   (If the writes are not being marked as 
 failures, that would definitely be an error in the device driver, as 
 you say.)  In this case, however, the ZIL log has stopped being 
 updated, but ZFS does nothing to announce that this has happened, or 
 to indicate that a remedy is required.

 I think you have some misconceptions about how the ZIL works.
 It doesn't provide journalling like UFS. The following might help:

 http://blogs.sun.com/perrin/entry/the_lumberjack

 The ZIL isn't used at all unless there's fsync/O_DSYNC activity.


 At the very least, it would be extremely helpful if  ZFS had a status 
 to report that indicates that the ZIL log is out of date, or that 
 there are troubles writing to the ZIL log, or something like that.

 If the ZIL cannot be written then we force a transaction group (txg)
 commit. That is the only recourse to force data to stable storage before
 returning to the application.

 An additional feature would be to have user-selectable behavior when 
 the ZIL log is significantly out of date.For example, if the ZIL 
 log is more than X seconds out of date, then new writes to the system 
 should pause, or give errors or continue to silently succeed.

 Again this doesn't make sense given how the ZIL works.


 In an earlier phase of my career when I worked for a database 
 company, I was responsible for a similar bug.   It caused a major 
 customer to lose a major amount of data when a system rebooted when 
 not all good data had been successfully committed to disk.The 
 resulting stink caused us to add a feature to detect the cases when 
 the writing-to-disk process had fallen too far behind, and to pause 
 new writes to the database until the situation was resolved.

 Peter

 Bob Friesenhahn wrote:
 While I do believe that device drivers. or the fault system, should 
 notify ZFS when a device fails (and ZFS should appropriately react), 
 I don't think that ZFS should be responsible for fault monitoring.  
 ZFS is in a rather poor position for device fault monitoring, and if 
 it attempts to do so then it will be slow and may misbehave in other 
 ways.  The software which communicates with the device (i.e. the 
 device driver) is in the best position to monitor the device.

 The primary goal of ZFS is to be able to correctly read data which 
 was successfully committed to disk.  There are programming 
 interfaces (e.g. fsync(), msync()) which may be used to ensure that 
 data is committed to disk, and which should return an error if there 
 is a problem.  If you were performing your tests over an NFS mount 
 then the results should be considerably different since NFS requests 
 that its data be committed to disk.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], 
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Richard Elling
Peter Cudhea wrote:
 Thanks, this is helpful. I was definitely misunderstanding the part that
 the ZIL plays in ZFS.

 I found Richard Elling's discussion of the FMA response to the failure
 very informative.   I see how the device driver, the fault analysis
 layer and the ZFS layer are all working together.Though the
 customer's complaint that the change in state from working to not
 working is taking too long seems pretty valid.
   

I wish there was a simple answer to the can-of-worms^TM that this
question opens.  But there really isn't.  As Paul Fisher points out,
logging 17,951 e-reports in 9 minutes seems like a lot, but I'm quite
sure that is CPU bound and I could log more with a faster system :-)
The key here is that 9 minutes represents some combination of timeouts
in the sd/scsa2usb/usb stack.  The myth of layered software says that
timeouts compound, so digging around for a better collection might
or might not be generally satisfying.  Since this is not a ZFS timeout,
perhaps the conversation should be continued in a more appropriate
forum?
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-30 Thread Jonathan Loran


 From a reporting perspective, yes, zpool status should not hang, and 
should report an error if a drive goes away, or is in any way behaving 
badly.  No arguments there.  From the data integrity perspective, the 
only event zfs needs to know about is when a bad drive is replaced, such 
that a resilver is triggered.  If a drive is suddenly gone, but it is 
only one component of a redundant set, your data should still be fine.  
Now, if enough drives go away to break the redundancy, that's a 
different story altogether.

Jon

Ross Smith wrote:
 I agree that device drivers should perform the bulk of the fault 
 monitoring, however I disagree that this absolves ZFS of any 
 responsibility for checking for errors.  The primary goal of ZFS is to 
 be a filesystem and maintain data integrity, and that entails both 
 reading and writing data to the devices.  It is no good having 
 checksumming when reading data if you are loosing huge amounts of data 
 when a disk fails.
  
 I'm not saying that ZFS should be monitoring disks and drivers to 
 ensure they are working, just that if ZFS attempts to write data and 
 doesn't get the response it's expecting, an error should be logged 
 against the device regardless of what the driver says.  If ZFS is 
 really about end-to-end data integrity, then you do need to consider 
 the possibility of a faulty driver.  Now I don't know what the root 
 cause of this error is, but I suspect it will be either a bad response 
 from the SATA driver, or something within ZFS that is not working 
 correctly.  Either way however I believe ZFS should have caught this.
  
 It's similar to the iSCSI problem I posted a few months back where the 
 ZFS pool hangs for 3 minutes when a device is disconnected.  There's 
 absolutely no need for the entire pool to hang when the other half of 
 the mirror is working fine.  ZFS is often compared to hardware raid 
 controllers, but so far it's ability to handle problems is falling short.
  
 Ross
  

  Date: Wed, 30 Jul 2008 09:48:34 -0500
  From: [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  CC: zfs-discuss@opensolaris.org
  Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive 
 removed
 
  On Wed, 30 Jul 2008, Ross wrote:
  
   Imagine you had a raid-z array and pulled a drive as I'm doing here.
   Because ZFS isn't aware of the removal it keeps writing to that
   drive as if it's valid. That means ZFS still believes the array is
   online when in fact it should be degrated. If any other drive now
   fails, ZFS will consider the status degrated instead of faulted, and
   will continue writing data. The problem is, ZFS is writing some of
   that data to a drive which doesn't exist, meaning all that data will
   be lost on reboot.
 
  While I do believe that device drivers. or the fault system, should
  notify ZFS when a device fails (and ZFS should appropriately react), I
  don't think that ZFS should be responsible for fault monitoring. ZFS
  is in a rather poor position for device fault monitoring, and if it
  attempts to do so then it will be slow and may misbehave in other
  ways. The software which communicates with the device (i.e. the
  device driver) is in the best position to monitor the device.
 
  The primary goal of ZFS is to be able to correctly read data which was
  successfully committed to disk. There are programming interfaces
  (e.g. fsync(), msync()) which may be used to ensure that data is
  committed to disk, and which should return an error if there is a
  problem. If you were performing your tests over an NFS mount then the
  results should be considerably different since NFS requests that its
  data be committed to disk.
 
  Bob


-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread Ross Smith
   USED  AVAILCAP  HEALTH  ALTROOTrc-pool  2.27T  
52.6G  2.21T 2%  DEGRADED  -test -  -  -  -  FAULTED  
-# zpool status test  pool: test state: UNAVAILstatus: One or more devices 
could not be opened.  There are insufficient replicas for the pool to continue 
functioning.action: Attach the missing device and online it using 'zpool 
online'.   see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requestedconfig:
 NAMESTATE READ WRITE CKSUM testUNAVAIL  0 0 0  
insufficient replicas   c2t7d0UNAVAIL  0 0 0  cannot open
 
 
-- At least re-activating the pool is simple, but gotta love the No known data 
errors line --
 
# cfgadm -c configure sata1/7# zpool status test  pool: test state: ONLINE 
scrub: none requestedconfig:
 NAMESTATE READ WRITE CKSUM testONLINE   0 0 0  
 c2t7d0ONLINE   0 0 0
errors: No known data errors
 
 
-- But of course, although ZFS thinks it's online, it didn't mount properly --
 
# cd /test# ls# zpool export test# rm -r /test# zpool import test# cd test# 
lsvar (copy)  var2
 
 
-- Now that's unexpected.  Those folders should be long gone.  Let's see how 
many files ZFS failed to delete --
 
# du -h -s /test  77M /test# find /test | wc -l   19033
 
 
So in addition to working for a full half hour creating files, it's also failed 
to remove 77MB of data contained in nearly 20,000 files.  And it's done all 
that without reporting any error or problem with the pool.
 
In fact, if I didn't know what I was looking for, there would be no indication 
of a problem at all.  Before the reboot I can't find what's going on as zfs 
status hangs.  After the reboot it says there's no problem.  Both ZFS and it's 
troubleshooting tools fail in a big way here.  
 
As others have said, zfs status should not hang.  ZFS has to know the state 
of all the drives and pools it's currently using, zfs status should simply 
report the current known status from ZFS' internal state.  It shouldn't need to 
scan anything.  ZFS' internal state should also be checking with cfgadm so that 
it knows if a disk isn't there.  It should also be updated if the cache can't 
be flushed to disk, and zfs list / zpool list needs to borrow state 
information from the status commands so that they don't say 'online' when the 
pool has problems.
 
ZFS needs to deal more intelligently with mount points when a pool has 
problems.  Leaving the folder lying around in a way that prevents the pool 
mounting properly when the drives are recovered is not good.  When the pool 
appears to come back online without errors, it would be very easy for somebody 
to assume the data was lost from the pool without realising that it simply 
hasn't mounted and they're actually looking at an empty folder.  Firstly ZFS 
should be removing the mount point when problems occur, and secondly, ZFS list 
or ZFS status should include information to inform you that the pool could not 
be mounted properly.
 
ZFS status really should be warning of any ZFS errors that occur.  Including 
things like being unable to mount the pool, CIFS mounts failing, etc...
 
And finally, if ZFS does find problems writing from the cache, it really needs 
to log somewhere the names of all the files affected, and the action that could 
not be carried out.  ZFS knows the files it was meant to delete here, it also 
knows the files that were written.  I can accept that with delayed writes files 
may occasionally be lost when a failure happens, but I don't accept that we 
need to loose all knowledge of the affected files when the filesystem has 
complete knowledge of what is affected.  If there are any working filesystems 
on the server, ZFS should make an attempt to store a log of the problem, 
failing that it should e-mail the data out.  The admin really needs to know 
what files have been affected so that they can notify users of the data loss.  
I don't know where you would store this information, but wherever that is, 
zpool status should be reporting the error and directing the admin to the log 
file.
 
I would probably say this could be safely stored on the system drive.  Would it 
be possible to have a number of possible places to store this log?  What I'm 
thinking is that if the system drive is unavailable, ZFS could try each pool in 
turn and attempt to store the log there.
 
In fact e-mail alerts or external error logging would be a great addition to 
ZFS.  Surely it makes sense that filesystem errors would be better off being 
stored and handled externally?
 
Ross
 
 Date: Mon, 28 Jul 2008 12:28:34 -0700 From: [EMAIL PROTECTED] Subject: Re: 
 [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL 
 PROTECTED]  I'm trying to reproduce and will let you know what I find. -- 
 richard 
_
The John Lewis Clearance - save up to 50% with FREE delivery
http://clk.atdmt.com/UKM/go/101719806

Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread Jonathan Loran
 this 
 information, but wherever that is, zpool status should be reporting 
 the error and directing the admin to the log file.
  
 I would probably say this could be safely stored on the system drive.  
 Would it be possible to have a number of possible places to store this 
 log?  What I'm thinking is that if the system drive is unavailable, 
 ZFS could try each pool in turn and attempt to store the log there.
  
 In fact e-mail alerts or external error logging would be a great 
 addition to ZFS.  Surely it makes sense that filesystem errors would 
 be better off being stored and handled externally?
  
 Ross
  


  Date: Mon, 28 Jul 2008 12:28:34 -0700
  From: [EMAIL PROTECTED]
  Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive 
 removed
  To: [EMAIL PROTECTED]
 
  I'm trying to reproduce and will let you know what I find.
  -- richard
 


 
 Win £3000 to spend on whatever you want at Uni! Click here to WIN! 
 http://clk.atdmt.com/UKM/go/101719803/direct/01/
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread David Collier-Brown
 where you would store this 
information, but wherever that is, zpool status should be reporting 
the error and directing the admin to the log file.
 
I would probably say this could be safely stored on the system drive.  
Would it be possible to have a number of possible places to store this 
log?  What I'm thinking is that if the system drive is unavailable, 
ZFS could try each pool in turn and attempt to store the log there.
 
In fact e-mail alerts or external error logging would be a great 
addition to ZFS.  Surely it makes sense that filesystem errors would 
be better off being stored and handled externally?
 
Ross
 



Date: Mon, 28 Jul 2008 12:28:34 -0700
From: [EMAIL PROTECTED]
Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive 

removed

To: [EMAIL PROTECTED]

I'm trying to reproduce and will let you know what I find.
-- richard




Win £3000 to spend on whatever you want at Uni! Click here to WIN! 
http://clk.atdmt.com/UKM/go/101719803/direct/01/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
 
 

-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-28 Thread Mattias Pantzare
 4. While reading an offline disk causes errors, writing does not!
*** CAUSES DATA LOSS ***

 This is a big one:  ZFS can continue writing to an unavailable pool.  It 
 doesn't always generate errors (I've seen it copy over 100MB
 before erroring), and if not spotted, this *will* cause data loss after you 
 reboot.

 I discovered this while testing how ZFS coped with the removal of a hot plug 
 SATA drive.  I knew that the ZFS admin tools were
 hanging, but that redundant pools remained available.  I wanted to see 
 whether it was just the ZFS admin tools that were failing,
 or whether ZFS was also failing to send appropriate error messages back to 
 the OS.


This is not unique for zfs. If you need to know that your writes has
reached stable store you have to call fsync(). It is not enough to
close a file. This is true even for UFS, but UFS won't delay writes
for all operations so you will notice faster. But you will still loose
data.

I have been able to undo rm -rf / on a FreeBSD system by pulling the
power cord before it wrote the changes...

Databases use fsync (or similar) before they close a transaction, that
one of the reasons that databases like hardware write caches.
cp will not.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-28 Thread Bob Friesenhahn
On Mon, 28 Jul 2008, Ross wrote:

 TEST1:  Opened File Browser, copied the test data to the pool. 
 Half way through the copy I pulled the drive.  THE COPY COMPLETED 
 WITHOUT ERROR.  Zpool list reports the pool as online, however zpool 
 status hung as expected.

Are you sure that this reference software you call File Browser 
actually responds to errors?  Maybe it is typical Linux-derived 
software which does not check for or handle errors and ZFS is 
reporting errors all along while the program pretends to copy the lost 
files.  If you were using Microsoft Windows, its file browser would 
probably report Unknown error: 666 but at least you would see an 
error dialog and you could visit the Microsoft knowledge base to learn 
that message ID 666 means Unknown error.  The other possibility is 
that all of these files fit in the ZFS write cache so the error 
reporting is delayed.

The Dtrace Toolkit provides a very useful DTrace script called 
'errinfo' which will list every system call which reports and error. 
This is very useful and informative.  If you run it, you will see 
every error reported to the application level.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-28 Thread Ross Smith

File Browser is the name of the program that Solaris opens when you open 
Computer on the desktop.  It's the default graphical file manager.
 
It does eventually stop copying with an error, but it takes a good long while 
for ZFS to throw up that error, and even when it does, the pool doesn't report 
any problems at all.
 Date: Mon, 28 Jul 2008 13:03:24 -0500 From: [EMAIL PROTECTED] To: [EMAIL 
 PROTECTED] CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] 
 Supermicro AOC-SAT2-MV8 hang when drive removed  On Mon, 28 Jul 2008, Ross 
 wrote:   TEST1: Opened File Browser, copied the test data to the pool.  
  Half way through the copy I pulled the drive. THE COPY COMPLETED   
 WITHOUT ERROR. Zpool list reports the pool as online, however zpool   
 status hung as expected.  Are you sure that this reference software you 
 call File Browser  actually responds to errors? Maybe it is typical 
 Linux-derived  software which does not check for or handle errors and ZFS is 
  reporting errors all along while the program pretends to copy the lost  
 files. If you were using Microsoft Windows, its file browser would  probably 
 report Unknown error: 666 but at least you would see an  error dialog and 
 you could visit the Microsoft knowledge base to learn  that message ID 666 
 means Unknown error. The other possibility is  that all of these files fit 
 in the ZFS write cache so the error  reporting is delayed.  The Dtrace 
 Toolkit provides a very useful DTrace script called  'errinfo' which will 
 list every system call which reports and error.  This is very useful and 
 informative. If you run it, you will see  every error reported to the 
 application level.  Bob == Bob 
 Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ 
 GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ 
_
Invite your Facebook friends to chat on Messenger
http://clk.atdmt.com/UKM/go/101719649/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-28 Thread Ross Smith

snv_91.  I downloaded snv_94 today so I'll be testing with that tomorrow.
 Date: Mon, 28 Jul 2008 09:58:43 -0700 From: [EMAIL PROTECTED] Subject: Re: 
 [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed To: [EMAIL 
 PROTECTED]  Which OS and revision? -- richard   Ross wrote:  Ok, 
 after doing a lot more testing of this I've found it's not the Supermicro 
 controller causing problems. It's purely ZFS, and it causes some major 
 problems! I've even found one scenario that appears to cause huge data loss 
 without any warning from ZFS - up to 30,000 files and 100MB of data missing 
 after a reboot, with zfs reporting that the pool is OK.   
 ***  1. 
 Solaris handles USB and SATA hot plug fine   If disks are not in use by 
 ZFS, you can unplug USB or SATA devices, cfgadm will recognise the 
 disconnection. USB devices are recognised automatically as you reconnect 
 them, SATA devices need reconfiguring. Cfgadm even recognises the SATA device 
 as an empty bay:   # cfgadm  Ap_Id Type Receptacle Occupant Condition 
  sata1/7 sata-port empty unconfigured ok  usb1/3 unknown empty 
 unconfigured ok   -- insert devices --   # cfgadm  Ap_Id Type 
 Receptacle Occupant Condition  sata1/7 disk connected unconfigured unknown 
  usb1/3 usb-storage connected configured ok   To bring the sata drive 
 online it's just a case of running  # cfgadm -c configure sata1/7
 ***  2. 
 If ZFS is using a hot plug device, disconnecting it will hang all ZFS status 
 tools.   While pools remain accessible, any attempt to run zpool status 
 will hang. I don't know if there is any way to recover these tools once this 
 happens. While this is a pretty big problem in itself, it also makes me worry 
 if other types of error could have the same effect. I see potential for this 
 leaving a server in a state whereby you know there are errors in a pool, but 
 have no way of finding out what those errors might be without rebooting the 
 server.   
 ***  3. 
 Once ZFS status tools are hung the computer will not shut down.   The 
 only way I've found to recover from this is to physically power down the 
 server. The solaris shutdown process simply hangs.   
 ***  4. 
 While reading an offline disk causes errors, writing does not!   *** CAUSES 
 DATA LOSS ***   This is a big one: ZFS can continue writing to an 
 unavailable pool. It doesn't always generate errors (I've seen it copy over 
 100MB before erroring), and if not spotted, this *will* cause data loss after 
 you reboot.   I discovered this while testing how ZFS coped with the 
 removal of a hot plug SATA drive. I knew that the ZFS admin tools were 
 hanging, but that redundant pools remained available. I wanted to see whether 
 it was just the ZFS admin tools that were failing, or whether ZFS was also 
 failing to send appropriate error messages back to the OS.   These are 
 the tests I carried out:   Zpool: Single drive zpool, consisting of one 
 250GB SATA drive in a hot plug bay.  Test data: A folder tree containing 
 19,160 items. 71.1MB in total.   TEST1: Opened File Browser, copied the 
 test data to the pool. Half way through the copy I pulled the drive. THE COPY 
 COMPLETED WITHOUT ERROR. Zpool list reports the pool as online, however zpool 
 status hung as expected.   Not quite believing the results, I rebooted 
 and tried again.   TEST2: Opened File Browser, copied the data to the 
 pool. Pulled the drive half way through. The copy again finished without 
 error. Checking the properties shows 19,160 files in the copy. ZFS list again 
 shows the filesystem as ONLINE.   Now I decided to see how many files I 
 could copy before it errored. I started the copy again. File Browser managed 
 a further 9,171 files before it stopped. That's nearly 30,000 files before 
 any error was detected. Again, despite the copy having finally errored, zpool 
 list shows the pool as online, even though zpool status hangs.   I 
 rebooted the server, and found that after the reboot my first copy contains 
 just 10,952 items, and my second copy is completely missing. That's a loss of 
 almost 20,000 files. Zpool status however reports NO ERRORS.   For the 
 third test I decided to see if these files are actually accessible before the 
 reboot:   TEST3: This time I pulled the drive *before* starting the copy. 
 The copy started much slower this time and only got to 2,939 files before 
 reporting an error. At this point I copied all the files that had been copied 
 to another pool, and then rebooted.   After the reboot, the folder in the 
 test pool had disappeared completely, but the copy I took before rebooting 
 was fine and contains 2,938 items, approximately 12MB of data. Again, zpool 
 status reports no errors

Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-28 Thread Miles Nordin
 mp == Mattias Pantzare [EMAIL PROTECTED] writes:

 This is a big one: ZFS can continue writing to an unavailable
 pool.  It doesn't always generate errors (I've seen it copy
 over 100MB before erroring), and if not spotted, this *will*
 cause data loss after you reboot.

mp This is not unique for zfs. If you need to know that your
mp writes has reached stable store you have to call fsync().

seconded.

How about this:

 * start the copy

 * pull the disk, without waiting for an error reported to the application

 * type 'lockfs -fa'.  Does either lockfs hang, or you get an
   immediate error after requesting the lockfs?

If so, I think it's ok and within the unix tradition to allow all
these writes, it's just maybe a more extreme version of the tradition,
which might not be an entirely bad compromise if ZFS can keep up this
behavior, and actually retry the unreported failed writes, when
confronted with FC, iSCSI, USB, FW targets that bounce.  I'm not sure
if it can ever do that yet or not, but architecturally I wouldn't want
to demand that it return failure to the app too soon, so long as
fsync() still behaves correctly w.r.t. power failures.


However the other problems you report are things I've run into, also.
'zpool status' should not be touching the disk at all.  so, we have:

 * 'zpool list' shows ONLINE several minutes after a drive is yanked.
   At the time 'zpool list' still shows ONLINE, 'zpool status' doesn't
   show anything at all because it hangs, so ONLINE seems too
   positive a report for the situation.  I'd suggest:

   + 'zpool list' should not borrow the ONLINE terminology from 'zpool
 status' if the list command means something different by the word
 ONLINE.  maybe SEEMS_TO_BE_AROUND_SOMEWHERE is more appropriate.

   + during this problem, 'zpool list' is available while 'zpool
 status' is not working.  Fine, maybe, during a failure, not all
 status tools will be available.  However it would be nice if, as
 a minimum, some status tool capable of reporting ``pool X is
 failing'' were available.  In the absence of that, you may have
 to reboot the machine without ever knowing even which pool failed
 to bring it down.

 * maybe sometimes certain types of status and statistics aren't
   available, but no status-reporting tools should ever be subject to
   blocking inside the kernel.  At worst they should refuse to give
   information, and return to a prompt, immediately.  I'm in the habit
   of typing 'zpool status ' during serious problems so I don't lose
   control of the console.

 * 'zpool status' is used when things are failing.  Cabling and driver
   state machines are among the failures from which a volume manager
   should protect us---that's why we say ``buy redundant controllers
   if possible.''

   In this scenario, a read is an intrusive act, because it could
   provoke a problem.  so even if 'zpool status' is only reading, not
   writing to disk nor to data structures inside the kernel, it is
   still not really a status tool.  It's an invasive
   poking/pinging/restarting/breaking tool.  Such tools should be
   segregated, and shouldn't substitute for the requirement to have
   true status tools that only read data structures kept in the
   kernel, not update kernel structures and not touch disks.  This
   would be like if 'ps' made an implicit call to rcapd, or activated
   some swapping thread, or something like that.  ``My machine is
   sluggish.  I wonder what's slowing it down.  ...'ps'...  oh, shit,
   now it's not responding at all, and I'll never know why.''

   There can be other tools, too, but I think LVM2 and SVM both have
   carefully non-invasive status tools, don't they?

   This principle should be followed everywhere.  For example,
   'iscsiadm list discovery-address' should simply list the discovery
   addresses.  It should not implicitly attempt to contact each
   discovery address in its list, while I wait.

-8-
terabithia:/# time iscsiadm list discovery-address
Discovery Address: 10.100.100.135:3260
Discovery Address: 10.100.100.138:3260

real0m45.935s
user0m0.006s
sys 0m0.019s
terabithia:/# jobs
[1]+  Running zpool status 
terabithia:/# 
-8-

   now, if you're really scalable, try the above again with 100 iSCSI
   targets and 20 pools.  A single 'iscsiadm list discovery-address'
   command, even if it's sort-of ``working'', can take hours to
   complete.

   This does not happen on Linux where I configure through text files
   and inspect status through 'cat /proc/...'

In other words, it's not just that the information 'zpool status'
gives is inaccurate.  It's not just that some information is hidden
(like how sometimes a device listed as ONLINE will say ``no valid
replicas'' when you try to offline it, and sometimes it won't, and the
only way to tell the difference is to attempt to offline the
device---so trying to 'zpool offline' each device in turn 

[zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-24 Thread Ross
Has anybody here got any thoughts on how to resolve this problem:
http://www.opensolaris.org/jive/thread.jspa?messageID=261204tstart=0

It sounds like two of us have been affected by this now, and it's a bit of a 
nuisance your entire server hanging when a drive is removed, makes you worry 
about how Solaris would handle a drive failure.

Has anybody tried pulling a drive on a live Thumper, surely they don't hang 
like this?  Although, having said that I do remember they do have a great big 
warning in the manual about using cfgadm to stop the disk before removal saying:

Caution - You must follow these steps before removing a disk from service.  
Failure to follow the procedure can corrupt your data or render your file 
system inoperable.

Ross
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-24 Thread Dave
I've discovered this as well - b81 to b93 (latest I've tried). I 
switched from my on-board SATA controller to AOC-SAT2-MV8 cards because 
the MCP55 controller caused random disk hangs. Now the SAT2-MV8 works as 
long as the drives are working correctly, but the system can't handle a 
drive failure or disconnect. :(

I don't think there's a bug filed for it. That would probably be the 
first step to getting this resolved (might also post to storage-discuss).

--
Dave

Ross wrote:
 Has anybody here got any thoughts on how to resolve this problem:
 http://www.opensolaris.org/jive/thread.jspa?messageID=261204tstart=0
 
 It sounds like two of us have been affected by this now, and it's a bit of a 
 nuisance your entire server hanging when a drive is removed, makes you worry 
 about how Solaris would handle a drive failure.
 
 Has anybody tried pulling a drive on a live Thumper, surely they don't hang 
 like this?  Although, having said that I do remember they do have a great big 
 warning in the manual about using cfgadm to stop the disk before removal 
 saying:
 
 Caution - You must follow these steps before removing a disk from service.  
 Failure to follow the procedure can corrupt your data or render your file 
 system inoperable.
 
 Ross
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-24 Thread Ross
Yeah, I thought of the storage forum today and found somebody else with the 
problem, and since my post a couple of people have reported similar issues on 
Thumpers.

I guess the storage thread is the best place for this now:
http://www.opensolaris.org/jive/thread.jspa?threadID=42507tstart=0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss