Re: FFS: wrong superblock check ~ crash

2014-10-27 Thread Maxime Villard
I'll commit this, unless anyone disagrees.

Index: ffs_vfsops.c
===
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.299
diff -u -r1.299 ffs_vfsops.c
--- ffs_vfsops.c24 May 2014 16:34:04 -  1.299
+++ ffs_vfsops.c27 Oct 2014 18:29:50 -
@@ -684,7 +684,7 @@
fs-fs_flags = ~FS_SWAPPED;
if ((newfs-fs_magic != FS_UFS1_MAGIC 
 newfs-fs_magic != FS_UFS2_MAGIC)||
-newfs-fs_bsize  MAXBSIZE ||
+newfs-fs_bsize  SBLOCKSIZE ||
 newfs-fs_bsize  sizeof(struct fs)) {
brelse(bp, 0);
kmem_free(newfs, fs-fs_sbsize);
@@ -974,7 +974,7 @@
continue;
 
/* Validate size of superblock */
-   if (sbsize  MAXBSIZE || sbsize  sizeof(struct fs))
+   if (sbsize  SBLOCKSIZE || sbsize  sizeof(struct fs))
continue;
 
/* Check that we can handle the file system blocksize */



Re: FFS: wrong superblock check ~ crash

2014-10-21 Thread Maxime Villard
Le 20/10/2014 18:48, Maxime Villard a écrit :
 
 Le 20/10/2014 18:23, David Holland a écrit :

 On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote:
   I think the sanity check should be:
   
   Index: ffs_vfsops.c
   ===
   RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
   retrieving revision 1.299
   diff -u -r1.299 ffs_vfsops.c
   --- ffs_vfsops.c  24 May 2014 16:34:04 -  1.299
   +++ ffs_vfsops.c  20 Oct 2014 13:01:46 -
   @@ -974,7 +974,7 @@
 continue;

 /* Validate size of superblock */
   - if (sbsize  MAXBSIZE || sbsize  sizeof(struct fs))
   + if (sbsize  SBLOCKSIZE || sbsize  sizeof(struct fs))
 continue;

 /* Check that we can handle the file system blocksize */
   
   Tested on NetBSD-current: no longer crashes.
   
   Ok/Comments?

 I think the check should be left alone, but afterwards the value
 should be clamped to the amount of data that can actually be
 transferred. Otherwise I think it may break, e.g. on volumes with odd
 block sizes.
 
 Yes that's what I thought first, but I saw a comment in ffs/fs.h on this:
 
   In all cases the size of the superblock will be SBLOCKSIZE.
 
 

I think changing

-   if (sbsize  MAXBSIZE || sbsize  sizeof(struct fs))
+   if (sbsize  SBLOCKSIZE || sbsize  sizeof(struct fs))

is ok. The kernel will behave randomly if sbsize8192, and this bug has been
here since the initial revision (20 years ago); if there were really strange
fs'es in the wild they would have triggered a panic, and we would have ended
up knowing it.

This patch simply makes the kernel return an error instead of behaving
randomly, with no implementation change. It is the less intrusive fix.

And from a more or less official specification:

http://www.phptr.com/content/images/0131482092/samplechapter/mcdougall_ch15.pdf

Page 11:
The primary super-block starts at an offset of 8192 bytes into the partition
 slice and occupies one file system block (usually 8192 bytes, but can be 4096
 bytes on x86 architectures).

So 8192 should be fine.



Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Manuel Bouyer
On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote:
 [...]
 With a broken superblock the kernel will read far beyond the allocated
 area, which mostly means it will crash.

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Taylor R Campbell
   Date: Mon, 20 Oct 2014 17:46:06 +0200
   From: Manuel Bouyer bou...@antioche.eu.org

   Sure. There's lot of other ways to crash the kernel with a broken ffs.
   In this specific case it's OK to return an error, but in the general
   case I prefer to have the kernel panic when an inconsistency is
   detected in ffs, than return an error and try to continue running with
   a bogus filesystem.

Continuing to run with a bogus file system is no good, but panicking
the kernel is worse.  If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread David Holland
On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote:
  I think the sanity check should be:
  
  Index: ffs_vfsops.c
  ===
  RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
  retrieving revision 1.299
  diff -u -r1.299 ffs_vfsops.c
  --- ffs_vfsops.c 24 May 2014 16:34:04 -  1.299
  +++ ffs_vfsops.c 20 Oct 2014 13:01:46 -
  @@ -974,7 +974,7 @@
   continue;
   
   /* Validate size of superblock */
  -if (sbsize  MAXBSIZE || sbsize  sizeof(struct fs))
  +if (sbsize  SBLOCKSIZE || sbsize  sizeof(struct fs))
   continue;
   
   /* Check that we can handle the file system blocksize */
  
  Tested on NetBSD-current: no longer crashes.
  
  Ok/Comments?

I think the check should be left alone, but afterwards the value
should be clamped to the amount of data that can actually be
transferred. Otherwise I think it may break, e.g. on volumes with odd
block sizes.

I'm not sure what gets put in sbsize in such cases; if it's always
SBLOCKSIZE, then we should probably reject anything that isn't exactly
SBLOCKSIZE and forget about slop or hypothetical compatibility with
larger superblocks. If it's the block size of the block the superblock
is in, though, then there should be a test like this.

Another semi-related thing to check, btw: does it invalidate the
buffers for candidate superblocks that reads and then rejects? Failure
to do this when the volume blocksize isn't SBLOCKSIZE (since it seems
to always read buffers of size SBLOCKSIZE here) might cause
interesting overlapping buffer lossage later.

-- 
David A. Holland
dholl...@netbsd.org


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Christos Zoulas
In article 20141020154606.ga10...@asim.lip6.fr,
Manuel Bouyer  bou...@antioche.eu.org wrote:
On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote:
 [...]
 With a broken superblock the kernel will read far beyond the allocated
 area, which mostly means it will crash.

Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.

Well, this was the mentality 30 years ago (let's panic), and this is why
we are here today. Sure it is fine and safe to panic(), but if I can
prevent the whole system from crashing and I can keep running in degraded
mode (isolating the broken filesystem), I'd rather have the choice to do
so. I.e. The best thing would be to choose the panic or isolate behavior
via a sysctl or a compilation time kernel define.

christos



Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Manuel Bouyer
On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote:
Date: Mon, 20 Oct 2014 17:46:06 +0200
From: Manuel Bouyer bou...@antioche.eu.org
 
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
 
 Continuing to run with a bogus file system is no good, but panicking
 the kernel is worse.  If the kernel takes any drastic action beyond
 merely returning an error, it should remount the file system
 read-only.

definitively not. I want a panic. If the filesystsem is corrupted something
has gone really wrong and you can't trust the running system any more.
And there are cases where returning EROFS is worse than panicing (e.g.
a NFS server).

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Maxime Villard
Le 20/10/2014 18:23, David Holland a écrit :
 
 On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote:
   I think the sanity check should be:
   
   Index: ffs_vfsops.c
   ===
   RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v
   retrieving revision 1.299
   diff -u -r1.299 ffs_vfsops.c
   --- ffs_vfsops.c   24 May 2014 16:34:04 -  1.299
   +++ ffs_vfsops.c   20 Oct 2014 13:01:46 -
   @@ -974,7 +974,7 @@
  continue;

  /* Validate size of superblock */
   -  if (sbsize  MAXBSIZE || sbsize  sizeof(struct fs))
   +  if (sbsize  SBLOCKSIZE || sbsize  sizeof(struct fs))
  continue;

  /* Check that we can handle the file system blocksize */
   
   Tested on NetBSD-current: no longer crashes.
   
   Ok/Comments?
 
 I think the check should be left alone, but afterwards the value
 should be clamped to the amount of data that can actually be
 transferred. Otherwise I think it may break, e.g. on volumes with odd
 block sizes.

Yes that's what I thought first, but I saw a comment in ffs/fs.h on this:

In all cases the size of the superblock will be SBLOCKSIZE.



Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Christos Zoulas
In article 20141020155832.ea8ae60...@jupiter.mumble.net,
Taylor R Campbell  campbell+netbsd-tech-k...@mumble.net wrote:
   Date: Mon, 20 Oct 2014 17:46:06 +0200
   From: Manuel Bouyer bou...@antioche.eu.org

   Sure. There's lot of other ways to crash the kernel with a broken ffs.
   In this specific case it's OK to return an error, but in the general
   case I prefer to have the kernel panic when an inconsistency is
   detected in ffs, than return an error and try to continue running with
   a bogus filesystem.

Continuing to run with a bogus file system is no good, but panicking
the kernel is worse.  If the kernel takes any drastic action beyond
merely returning an error, it should remount the file system
read-only.

This is wishful thinking (unless we fix the current set of bugs
that prevent us from doing so even in a healthy filesystem for example
PR/30525). I would be happy if we could isolate the broken filesystem
from all I/O operations instead of crashing.

There are many different recipes that keep filedescriptors for R/W that
corrupt the filesystem during R/O downgrades.

christos




Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Greg Troxel

Manuel Bouyer bou...@antioche.eu.org writes:

 On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote:
Date: Mon, 20 Oct 2014 17:46:06 +0200
From: Manuel Bouyer bou...@antioche.eu.org
 
Sure. There's lot of other ways to crash the kernel with a broken ffs.
In this specific case it's OK to return an error, but in the general
case I prefer to have the kernel panic when an inconsistency is
detected in ffs, than return an error and try to continue running with
a bogus filesystem.
 
 Continuing to run with a bogus file system is no good, but panicking
 the kernel is worse.  If the kernel takes any drastic action beyond
 merely returning an error, it should remount the file system
 read-only.

 definitively not. I want a panic. If the filesystsem is corrupted something
 has gone really wrong and you can't trust the running system any more.
 And there are cases where returning EROFS is worse than panicing (e.g.
 a NFS server).

The basic issue here is that there are two use cases:

  The FFS file system in question is /, or /usr, or something else that
  the operator deems Wicked Important.

  The FFS file syste is on some random removable media that someone just
  wants to access.

The desires, while each is reasonabl, are totally different.

I would suggest a new filesystem option (perhaps poe for panic on
error, with a nod to Dr. Strangelove) that causes detection of
consistency errors to panic rather than error.  Then people can turn it
on for filesystems that are so critical that they prefer a panic.



pgpfoA09uU01o.pgp
Description: PGP signature


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Manuel Bouyer
On Mon, Oct 20, 2014 at 04:57:19PM +, paul_kon...@dell.com wrote:
 I disagree.
 
 There?s a principle of networking product design, which is that you are never 
 allowed to crash due to external input.  If you receive an invalid packet, 
 you can complain about it and say the sender is broken, but if you crash from 
 it it?s always your bug, no excuses, no exceptions.

I agree.

 
 I would treat all external inputs that way; storage is an external input.

no, for most server usage it's internal input. You don't plug random
USB keys to your servers, don't you ? And even if I have ffs on
USB keys I don't consider them as external as I use them only on
my systems.

 Panics are for INTERNAL consistency failures, for example a state machine 
 that gets into a non-existent state.  But any objection to outside data must 
 be a rejection of that data, nothing more.

I agree. And if the kernel detects an inconsistent ffs state, it's either
because it did write bogus data to the disk (in which case you can consider
that the kernel itself is in an inconsistent state) or because the
hardware is bogus (in which case a panic is also not that silly).

Remounting a filesystem read-only on a running server is certainly the worse
way of dealing with the problem.

-- 
Manuel Bouyer, LIP6, Universite Paris VI.   manuel.bou...@lip6.fr
 Tel: 01 44 27 70 14  Fax: 01 44 27 72 80
--


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Manuel Bouyer
On Mon, Oct 20, 2014 at 01:10:58PM -0400, Greg Troxel wrote:
 [...]
 The basic issue here is that there are two use cases:
 
   The FFS file system in question is /, or /usr, or something else that
   the operator deems Wicked Important.
 
   The FFS file syste is on some random removable media that someone just
   wants to access.

For random removable media, the only reasonable way to access it is
from a non-root userland process :) Or in a single-use virtual machine.

Maybe we should run the USB stack in a non-root userland too, BTW.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Mindaugas Rasiukevicius
Manuel Bouyer bou...@antioche.eu.org wrote:
 On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote:
 Date: Mon, 20 Oct 2014 17:46:06 +0200
 From: Manuel Bouyer bou...@antioche.eu.org
  
 Sure. There's lot of other ways to crash the kernel with a broken
  ffs. In this specific case it's OK to return an error, but in the
  general case I prefer to have the kernel panic when an inconsistency is
 detected in ffs, than return an error and try to continue running
  with a bogus filesystem.
  
  Continuing to run with a bogus file system is no good, but panicking
  the kernel is worse.  If the kernel takes any drastic action beyond
  merely returning an error, it should remount the file system
  read-only.
 
 definitively not. I want a panic. If the filesystsem is corrupted
 something has gone really wrong and you can't trust the running system
 any more. And there are cases where returning EROFS is worse than
 panicing (e.g. a NFS server).

Disagree.  The kernel should remount the file system in read-only mode.

Perhaps we can debate what to do with corrupted / when the system is
booting, but for other cases (especially hot-plug or external disks)
I certainly do not expect a crash.  The system should clearly indicate
the errors to the user and be defensive (hence remount in read-only),
but if I insert a USB stick with a garbage and my system crashes then
it is a plain bug with potential security implications.

-- 
Mindaugas


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Greg Troxel

Martin Husemann mar...@duskware.de writes:

 On Mon, Oct 20, 2014 at 07:48:31PM +0100, Mindaugas Rasiukevicius wrote:
 Perhaps we can debate what to do with corrupted / when the system is
 booting, but for other cases (especially hot-plug or external disks)
 I certainly do not expect a crash.

 Please raise your hand if you did NOT use mount -o rump
 when mounting such a disk the last time on NetBSD - anyone?

That certainly is the best practice, but it's a bug if not using rump
leads to trouble.



pgpUUzLM5PgUE.pgp
Description: PGP signature


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Mindaugas Rasiukevicius
Manuel Bouyer bou...@antioche.eu.org wrote:
 On Mon, Oct 20, 2014 at 07:48:31PM +0100, Mindaugas Rasiukevicius wrote:
   definitively not. I want a panic. If the filesystsem is corrupted
   something has gone really wrong and you can't trust the running system
   any more. And there are cases where returning EROFS is worse than
   panicing (e.g. a NFS server).
  
  Disagree.  The kernel should remount the file system in read-only mode.
  
  Perhaps we can debate what to do with corrupted / when the system is
  booting, but for other cases (especially hot-plug or external disks)
  I certainly do not expect a crash.
 
 I do, it's the only sane thing to do if the systen did write bogus
 data to the disk and notice it later. Remounting in read-only mode
 on a server with active services running doens't do anything good
 (I know because linux servers do this. A panic and reboot is a much
 better behavior).

Huh?  If your use case is a single / partition for everything then sure.
I can actually extend my statement: the system should not crash neither
in a case of corrupt file system nor a disk failure (which may or may not
lead to corrupt file system).

If I have a system with an array of disks and one of them fails, a crash
would take down the whole node.  When many terabytes of data suddenly
disappear people get unhappy; it usually costs quite a bit of money too.
So, how about making only the *failed* segment of data unavailable?  In
many cases this is suitable and desirable; consider distributed systems
(you have redundancy) or caches (where I can just discard the corrupted
data segment and refetch it from the origin).  Meanwhile, I can replace
the failed disk and/or rebuild the corrupt file system while experiencing
only a limited disruption.

 If the corrupted filesystem is from a corrupted USB key then not panicing
 is probably better; but 1) USB keys usually don't have ffs on them 2)
 In such case it would be better to run the filesystem code in userland
 anyway.

So now you make an assumption about file systems used on USB sticks?
That does not matter.  Somebody can create a USB stick with manually
handcrafted superblock to crash your machine or maybe even exploit it.
To me, that constitutes a security vulnerability.

-- 
Mindaugas


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Manuel Bouyer
On Mon, Oct 20, 2014 at 08:44:21PM +0100, Mindaugas Rasiukevicius wrote:
 Huh?  If your use case is a single / partition for everything then sure.

I don't but other are equally important.

 I can actually extend my statement: the system should not crash neither
 in a case of corrupt file system nor a disk failure (which may or may not
 lead to corrupt file system).

detectable disk failures are handled by RAID in my context.
But I've also got the case of disks that would silently corrupt data.
In this case you want to stop operations ASAP because if you keep things
running you only make things worse (corrupting good data on other members
of the RAID).

 
 If I have a system with an array of disks and one of them fails, a crash
 would take down the whole node.  When many terabytes of data suddenly
 disappear people get unhappy;

As they would if the partition they're working on suddently becomes read-only.
It may be less damaging to stop the server, even if only one of the
many partitions is affected (and if you have that many partitions,
maybe you should also have more servers)


 it usually costs quite a bit of money too.
 So, how about making only the *failed* segment of data unavailable?  In
 many cases this is suitable and desirable; consider distributed systems
 (you have redundancy)

then you can take the system down, no problem. It's even probably better
to take it down completely, than letting it run in degraded mode.

 or caches (where I can just discard the corrupted
 data segment and refetch it from the origin).

How can the cache know where the corruption comes from ?

Anywau I can't see existing software that could deal gracefully with the
filesystem they're working on becoming unavailable. remounting read-only
is the default linux behavior and I've been hit by this many times.
Not only the server was unavailable to users, but it created
other problems (like mails bouncing, nfs operations failing instead
of just waiting for the server to come back, etc) while the server ran
in degraded mode before someone could stop it.


 Meanwhile, I can replace
 the failed disk and/or rebuild the corrupt file system while experiencing
 only a limited disruption.
 
  If the corrupted filesystem is from a corrupted USB key then not panicing
  is probably better; but 1) USB keys usually don't have ffs on them 2)
  In such case it would be better to run the filesystem code in userland
  anyway.
 
 So now you make an assumption about file systems used on USB sticks?
 That does not matter.  Somebody can create a USB stick with manually
 handcrafted superblock to crash your machine or maybe even exploit it.
 To me, that constitutes a security vulnerability.

Of course he can. He can also change the firmware of the USB stick
for this purpose. That's why untrusted USB sticks should not be mounted
using the in-kernel filesystems (but the USB stack may be a problem too).

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Michael van Elst
bou...@antioche.eu.org (Manuel Bouyer) writes:

definitively not. I want a panic.

Definitely an administrative decision then. You could make it a
mount option :)




Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Mindaugas Rasiukevicius
mlel...@serpens.de (Michael van Elst) wrote:
 rm...@netbsd.org (Mindaugas Rasiukevicius) writes:
 
 If I have a system with an array of disks and one of them fails, a crash
 would take down the whole node.  When many terabytes of data suddenly
 disappear people get unhappy;
 
 When a single node going down makes people unhappy, you may want to have
 more than one :)

There are convergence and recovery times which may have an extra cost
for you.  Sure, the failure of a single node might merely result in an
increased latency somewhere.  However, there can be a significant
difference in recovering the whole node and just replacing/recovering
one segment of data in it.  That is especially the case with the caches,
you do not want to randomly blow and refill them - it takes some time.

 it usually costs quite a bit of money too.
 So, how about making only the *failed* segment of data unavailable?
 
 It all depends on what you can or are willing to handle. Also what a
 crash (and reboot) would actually solve. It doesn't help if your
 boot partition is damaged and you force a reboot, converting a
 maybe half working system into a dead system.

Not really relevant to my example where you would have dedicated disks
for the data, but I think we agree on this - there are different use
cases and it depends.  If the boot partition is damaged, then it does
indeed most likely mean that the system is not in the state where it
can do its job.  I would probably still just mount the file system to
read-only mode and let it fail elsewhere, but it may as well panic; at
this point it is non-functioning and requires manual intervention anyway.

 In my use case, a broken filesystem is usually a sign of an unnoticed
 hardware or software error and the best reaction to recover is to
 panic (and throw the data away, the machines have only temporary
 data). Continuing with a read-only filesystem doesn't do any good,
 because you have no means to find out wether the data you can read
 is complete or correct.

Why not?  It seems to be a question of how do you communicate the errors
back to the applications or administrator.  There are applications which
gracefully handle EIOs exactly for this purpose.  The fact that they are
very rare does not mean they do not exist. :)

 Most clustered systems also handle complete outages better than
 a degraded mode. That's why you have things like STONITH.

It really depends on the application or service; you often have to take
design considerations for both though.

-- 
Mindaugas


re: FFS: wrong superblock check ~ crash

2014-10-20 Thread matthew green

Michael van Elst writes:
 bou...@antioche.eu.org (Manuel Bouyer) writes:
 
 definitively not. I want a panic.
 
 Definitely an administrative decision then. You could make it a
 mount option :)

indeed.

istr that solaris has onerror=panic|remount|unmount|...? option.


.mrg.


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Stephan
Yes, you con configure that on a zpool:

failmode  YES   wait | continue | panic

This is relevant for SAN disks for instance, where you don´t
necessarily want to crash the server when a disk becomes temporary
unavailable.

2014-10-21 2:44 GMT+02:00 matthew green m...@eterna.com.au:

 Michael van Elst writes:
 bou...@antioche.eu.org (Manuel Bouyer) writes:

 definitively not. I want a panic.

 Definitely an administrative decision then. You could make it a
 mount option :)

 indeed.

 istr that solaris has onerror=panic|remount|unmount|...? option.


 .mrg.


Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Michael van Elst
rm...@netbsd.org (Mindaugas Rasiukevicius) writes:

 In my use case, a broken filesystem is usually a sign of an unnoticed
 hardware or software error and the best reaction to recover is to
 panic (and throw the data away, the machines have only temporary
 data). Continuing with a read-only filesystem doesn't do any good,
 because you have no means to find out wether the data you can read
 is complete or correct.

Why not?  It seems to be a question of how do you communicate the errors
back to the applications or administrator.  There are applications which
gracefully handle EIOs exactly for this purpose.  The fact that they are
very rare does not mean they do not exist. :)

See, that's what in my use case means.


 Most clustered systems also handle complete outages better than
 a degraded mode. That's why you have things like STONITH.

It really depends on the application or service; you often have to take
design considerations for both though.

That was the point, there is no definitive solution. You, as an
administrator, need to choose.




Re: FFS: wrong superblock check ~ crash

2014-10-20 Thread Michael van Elst
m...@eterna.com.au (matthew green) writes:


Michael van Elst writes:
 bou...@antioche.eu.org (Manuel Bouyer) writes:
 
 definitively not. I want a panic.
 
 Definitely an administrative decision then. You could make it a
 mount option :)

indeed.

istr that solaris has onerror=panic|remount|unmount|...? option.

Same for Linux.

But before you can make it a mount option, the filesystem code
must be able to handle the cases without panic :)



re: FFS: wrong superblock check ~ crash

2014-10-20 Thread matthew green

  Definitely an administrative decision then. You could make it a
  mount option :)
 
 indeed.
 
 istr that solaris has onerror=panic|remount|unmount|...? option.
 
 Same for Linux.
 
 But before you can make it a mount option, the filesystem code
 must be able to handle the cases without panic :)

minor details... ;)