Re: FFS: wrong superblock check ~ crash
I'll commit this, unless anyone disagrees. Index: ffs_vfsops.c === RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.299 diff -u -r1.299 ffs_vfsops.c --- ffs_vfsops.c24 May 2014 16:34:04 - 1.299 +++ ffs_vfsops.c27 Oct 2014 18:29:50 - @@ -684,7 +684,7 @@ fs-fs_flags = ~FS_SWAPPED; if ((newfs-fs_magic != FS_UFS1_MAGIC newfs-fs_magic != FS_UFS2_MAGIC)|| -newfs-fs_bsize MAXBSIZE || +newfs-fs_bsize SBLOCKSIZE || newfs-fs_bsize sizeof(struct fs)) { brelse(bp, 0); kmem_free(newfs, fs-fs_sbsize); @@ -974,7 +974,7 @@ continue; /* Validate size of superblock */ - if (sbsize MAXBSIZE || sbsize sizeof(struct fs)) + if (sbsize SBLOCKSIZE || sbsize sizeof(struct fs)) continue; /* Check that we can handle the file system blocksize */
Re: FFS: wrong superblock check ~ crash
Le 20/10/2014 18:48, Maxime Villard a écrit : Le 20/10/2014 18:23, David Holland a écrit : On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote: I think the sanity check should be: Index: ffs_vfsops.c === RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.299 diff -u -r1.299 ffs_vfsops.c --- ffs_vfsops.c 24 May 2014 16:34:04 - 1.299 +++ ffs_vfsops.c 20 Oct 2014 13:01:46 - @@ -974,7 +974,7 @@ continue; /* Validate size of superblock */ - if (sbsize MAXBSIZE || sbsize sizeof(struct fs)) + if (sbsize SBLOCKSIZE || sbsize sizeof(struct fs)) continue; /* Check that we can handle the file system blocksize */ Tested on NetBSD-current: no longer crashes. Ok/Comments? I think the check should be left alone, but afterwards the value should be clamped to the amount of data that can actually be transferred. Otherwise I think it may break, e.g. on volumes with odd block sizes. Yes that's what I thought first, but I saw a comment in ffs/fs.h on this: In all cases the size of the superblock will be SBLOCKSIZE. I think changing - if (sbsize MAXBSIZE || sbsize sizeof(struct fs)) + if (sbsize SBLOCKSIZE || sbsize sizeof(struct fs)) is ok. The kernel will behave randomly if sbsize8192, and this bug has been here since the initial revision (20 years ago); if there were really strange fs'es in the wild they would have triggered a panic, and we would have ended up knowing it. This patch simply makes the kernel return an error instead of behaving randomly, with no implementation change. It is the less intrusive fix. And from a more or less official specification: http://www.phptr.com/content/images/0131482092/samplechapter/mcdougall_ch15.pdf Page 11: The primary super-block starts at an offset of 8192 bytes into the partition slice and occupies one file system block (usually 8192 bytes, but can be 4096 bytes on x86 architectures). So 8192 should be fine.
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote: [...] With a broken superblock the kernel will read far beyond the allocated area, which mostly means it will crash. Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: FFS: wrong superblock check ~ crash
Date: Mon, 20 Oct 2014 17:46:06 +0200 From: Manuel Bouyer bou...@antioche.eu.org Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Continuing to run with a bogus file system is no good, but panicking the kernel is worse. If the kernel takes any drastic action beyond merely returning an error, it should remount the file system read-only.
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote: I think the sanity check should be: Index: ffs_vfsops.c === RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.299 diff -u -r1.299 ffs_vfsops.c --- ffs_vfsops.c 24 May 2014 16:34:04 - 1.299 +++ ffs_vfsops.c 20 Oct 2014 13:01:46 - @@ -974,7 +974,7 @@ continue; /* Validate size of superblock */ -if (sbsize MAXBSIZE || sbsize sizeof(struct fs)) +if (sbsize SBLOCKSIZE || sbsize sizeof(struct fs)) continue; /* Check that we can handle the file system blocksize */ Tested on NetBSD-current: no longer crashes. Ok/Comments? I think the check should be left alone, but afterwards the value should be clamped to the amount of data that can actually be transferred. Otherwise I think it may break, e.g. on volumes with odd block sizes. I'm not sure what gets put in sbsize in such cases; if it's always SBLOCKSIZE, then we should probably reject anything that isn't exactly SBLOCKSIZE and forget about slop or hypothetical compatibility with larger superblocks. If it's the block size of the block the superblock is in, though, then there should be a test like this. Another semi-related thing to check, btw: does it invalidate the buffers for candidate superblocks that reads and then rejects? Failure to do this when the volume blocksize isn't SBLOCKSIZE (since it seems to always read buffers of size SBLOCKSIZE here) might cause interesting overlapping buffer lossage later. -- David A. Holland dholl...@netbsd.org
Re: FFS: wrong superblock check ~ crash
In article 20141020154606.ga10...@asim.lip6.fr, Manuel Bouyer bou...@antioche.eu.org wrote: On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote: [...] With a broken superblock the kernel will read far beyond the allocated area, which mostly means it will crash. Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Well, this was the mentality 30 years ago (let's panic), and this is why we are here today. Sure it is fine and safe to panic(), but if I can prevent the whole system from crashing and I can keep running in degraded mode (isolating the broken filesystem), I'd rather have the choice to do so. I.e. The best thing would be to choose the panic or isolate behavior via a sysctl or a compilation time kernel define. christos
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote: Date: Mon, 20 Oct 2014 17:46:06 +0200 From: Manuel Bouyer bou...@antioche.eu.org Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Continuing to run with a bogus file system is no good, but panicking the kernel is worse. If the kernel takes any drastic action beyond merely returning an error, it should remount the file system read-only. definitively not. I want a panic. If the filesystsem is corrupted something has gone really wrong and you can't trust the running system any more. And there are cases where returning EROFS is worse than panicing (e.g. a NFS server). -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: FFS: wrong superblock check ~ crash
Le 20/10/2014 18:23, David Holland a écrit : On Mon, Oct 20, 2014 at 03:38:11PM +0200, Maxime Villard wrote: I think the sanity check should be: Index: ffs_vfsops.c === RCS file: /cvsroot/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.299 diff -u -r1.299 ffs_vfsops.c --- ffs_vfsops.c 24 May 2014 16:34:04 - 1.299 +++ ffs_vfsops.c 20 Oct 2014 13:01:46 - @@ -974,7 +974,7 @@ continue; /* Validate size of superblock */ - if (sbsize MAXBSIZE || sbsize sizeof(struct fs)) + if (sbsize SBLOCKSIZE || sbsize sizeof(struct fs)) continue; /* Check that we can handle the file system blocksize */ Tested on NetBSD-current: no longer crashes. Ok/Comments? I think the check should be left alone, but afterwards the value should be clamped to the amount of data that can actually be transferred. Otherwise I think it may break, e.g. on volumes with odd block sizes. Yes that's what I thought first, but I saw a comment in ffs/fs.h on this: In all cases the size of the superblock will be SBLOCKSIZE.
Re: FFS: wrong superblock check ~ crash
In article 20141020155832.ea8ae60...@jupiter.mumble.net, Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: Date: Mon, 20 Oct 2014 17:46:06 +0200 From: Manuel Bouyer bou...@antioche.eu.org Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Continuing to run with a bogus file system is no good, but panicking the kernel is worse. If the kernel takes any drastic action beyond merely returning an error, it should remount the file system read-only. This is wishful thinking (unless we fix the current set of bugs that prevent us from doing so even in a healthy filesystem for example PR/30525). I would be happy if we could isolate the broken filesystem from all I/O operations instead of crashing. There are many different recipes that keep filedescriptors for R/W that corrupt the filesystem during R/O downgrades. christos
Re: FFS: wrong superblock check ~ crash
Manuel Bouyer bou...@antioche.eu.org writes: On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote: Date: Mon, 20 Oct 2014 17:46:06 +0200 From: Manuel Bouyer bou...@antioche.eu.org Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Continuing to run with a bogus file system is no good, but panicking the kernel is worse. If the kernel takes any drastic action beyond merely returning an error, it should remount the file system read-only. definitively not. I want a panic. If the filesystsem is corrupted something has gone really wrong and you can't trust the running system any more. And there are cases where returning EROFS is worse than panicing (e.g. a NFS server). The basic issue here is that there are two use cases: The FFS file system in question is /, or /usr, or something else that the operator deems Wicked Important. The FFS file syste is on some random removable media that someone just wants to access. The desires, while each is reasonabl, are totally different. I would suggest a new filesystem option (perhaps poe for panic on error, with a nod to Dr. Strangelove) that causes detection of consistency errors to panic rather than error. Then people can turn it on for filesystems that are so critical that they prefer a panic. pgpfoA09uU01o.pgp Description: PGP signature
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 04:57:19PM +, paul_kon...@dell.com wrote: I disagree. There?s a principle of networking product design, which is that you are never allowed to crash due to external input. If you receive an invalid packet, you can complain about it and say the sender is broken, but if you crash from it it?s always your bug, no excuses, no exceptions. I agree. I would treat all external inputs that way; storage is an external input. no, for most server usage it's internal input. You don't plug random USB keys to your servers, don't you ? And even if I have ffs on USB keys I don't consider them as external as I use them only on my systems. Panics are for INTERNAL consistency failures, for example a state machine that gets into a non-existent state. But any objection to outside data must be a rejection of that data, nothing more. I agree. And if the kernel detects an inconsistent ffs state, it's either because it did write bogus data to the disk (in which case you can consider that the kernel itself is in an inconsistent state) or because the hardware is bogus (in which case a panic is also not that silly). Remounting a filesystem read-only on a running server is certainly the worse way of dealing with the problem. -- Manuel Bouyer, LIP6, Universite Paris VI. manuel.bou...@lip6.fr Tel: 01 44 27 70 14 Fax: 01 44 27 72 80 --
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 01:10:58PM -0400, Greg Troxel wrote: [...] The basic issue here is that there are two use cases: The FFS file system in question is /, or /usr, or something else that the operator deems Wicked Important. The FFS file syste is on some random removable media that someone just wants to access. For random removable media, the only reasonable way to access it is from a non-root userland process :) Or in a single-use virtual machine. Maybe we should run the USB stack in a non-root userland too, BTW. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: FFS: wrong superblock check ~ crash
Manuel Bouyer bou...@antioche.eu.org wrote: On Mon, Oct 20, 2014 at 03:58:45PM +, Taylor R Campbell wrote: Date: Mon, 20 Oct 2014 17:46:06 +0200 From: Manuel Bouyer bou...@antioche.eu.org Sure. There's lot of other ways to crash the kernel with a broken ffs. In this specific case it's OK to return an error, but in the general case I prefer to have the kernel panic when an inconsistency is detected in ffs, than return an error and try to continue running with a bogus filesystem. Continuing to run with a bogus file system is no good, but panicking the kernel is worse. If the kernel takes any drastic action beyond merely returning an error, it should remount the file system read-only. definitively not. I want a panic. If the filesystsem is corrupted something has gone really wrong and you can't trust the running system any more. And there are cases where returning EROFS is worse than panicing (e.g. a NFS server). Disagree. The kernel should remount the file system in read-only mode. Perhaps we can debate what to do with corrupted / when the system is booting, but for other cases (especially hot-plug or external disks) I certainly do not expect a crash. The system should clearly indicate the errors to the user and be defensive (hence remount in read-only), but if I insert a USB stick with a garbage and my system crashes then it is a plain bug with potential security implications. -- Mindaugas
Re: FFS: wrong superblock check ~ crash
Martin Husemann mar...@duskware.de writes: On Mon, Oct 20, 2014 at 07:48:31PM +0100, Mindaugas Rasiukevicius wrote: Perhaps we can debate what to do with corrupted / when the system is booting, but for other cases (especially hot-plug or external disks) I certainly do not expect a crash. Please raise your hand if you did NOT use mount -o rump when mounting such a disk the last time on NetBSD - anyone? That certainly is the best practice, but it's a bug if not using rump leads to trouble. pgpUUzLM5PgUE.pgp Description: PGP signature
Re: FFS: wrong superblock check ~ crash
Manuel Bouyer bou...@antioche.eu.org wrote: On Mon, Oct 20, 2014 at 07:48:31PM +0100, Mindaugas Rasiukevicius wrote: definitively not. I want a panic. If the filesystsem is corrupted something has gone really wrong and you can't trust the running system any more. And there are cases where returning EROFS is worse than panicing (e.g. a NFS server). Disagree. The kernel should remount the file system in read-only mode. Perhaps we can debate what to do with corrupted / when the system is booting, but for other cases (especially hot-plug or external disks) I certainly do not expect a crash. I do, it's the only sane thing to do if the systen did write bogus data to the disk and notice it later. Remounting in read-only mode on a server with active services running doens't do anything good (I know because linux servers do this. A panic and reboot is a much better behavior). Huh? If your use case is a single / partition for everything then sure. I can actually extend my statement: the system should not crash neither in a case of corrupt file system nor a disk failure (which may or may not lead to corrupt file system). If I have a system with an array of disks and one of them fails, a crash would take down the whole node. When many terabytes of data suddenly disappear people get unhappy; it usually costs quite a bit of money too. So, how about making only the *failed* segment of data unavailable? In many cases this is suitable and desirable; consider distributed systems (you have redundancy) or caches (where I can just discard the corrupted data segment and refetch it from the origin). Meanwhile, I can replace the failed disk and/or rebuild the corrupt file system while experiencing only a limited disruption. If the corrupted filesystem is from a corrupted USB key then not panicing is probably better; but 1) USB keys usually don't have ffs on them 2) In such case it would be better to run the filesystem code in userland anyway. So now you make an assumption about file systems used on USB sticks? That does not matter. Somebody can create a USB stick with manually handcrafted superblock to crash your machine or maybe even exploit it. To me, that constitutes a security vulnerability. -- Mindaugas
Re: FFS: wrong superblock check ~ crash
On Mon, Oct 20, 2014 at 08:44:21PM +0100, Mindaugas Rasiukevicius wrote: Huh? If your use case is a single / partition for everything then sure. I don't but other are equally important. I can actually extend my statement: the system should not crash neither in a case of corrupt file system nor a disk failure (which may or may not lead to corrupt file system). detectable disk failures are handled by RAID in my context. But I've also got the case of disks that would silently corrupt data. In this case you want to stop operations ASAP because if you keep things running you only make things worse (corrupting good data on other members of the RAID). If I have a system with an array of disks and one of them fails, a crash would take down the whole node. When many terabytes of data suddenly disappear people get unhappy; As they would if the partition they're working on suddently becomes read-only. It may be less damaging to stop the server, even if only one of the many partitions is affected (and if you have that many partitions, maybe you should also have more servers) it usually costs quite a bit of money too. So, how about making only the *failed* segment of data unavailable? In many cases this is suitable and desirable; consider distributed systems (you have redundancy) then you can take the system down, no problem. It's even probably better to take it down completely, than letting it run in degraded mode. or caches (where I can just discard the corrupted data segment and refetch it from the origin). How can the cache know where the corruption comes from ? Anywau I can't see existing software that could deal gracefully with the filesystem they're working on becoming unavailable. remounting read-only is the default linux behavior and I've been hit by this many times. Not only the server was unavailable to users, but it created other problems (like mails bouncing, nfs operations failing instead of just waiting for the server to come back, etc) while the server ran in degraded mode before someone could stop it. Meanwhile, I can replace the failed disk and/or rebuild the corrupt file system while experiencing only a limited disruption. If the corrupted filesystem is from a corrupted USB key then not panicing is probably better; but 1) USB keys usually don't have ffs on them 2) In such case it would be better to run the filesystem code in userland anyway. So now you make an assumption about file systems used on USB sticks? That does not matter. Somebody can create a USB stick with manually handcrafted superblock to crash your machine or maybe even exploit it. To me, that constitutes a security vulnerability. Of course he can. He can also change the firmware of the USB stick for this purpose. That's why untrusted USB sticks should not be mounted using the in-kernel filesystems (but the USB stack may be a problem too). -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: FFS: wrong superblock check ~ crash
bou...@antioche.eu.org (Manuel Bouyer) writes: definitively not. I want a panic. Definitely an administrative decision then. You could make it a mount option :)
Re: FFS: wrong superblock check ~ crash
mlel...@serpens.de (Michael van Elst) wrote: rm...@netbsd.org (Mindaugas Rasiukevicius) writes: If I have a system with an array of disks and one of them fails, a crash would take down the whole node. When many terabytes of data suddenly disappear people get unhappy; When a single node going down makes people unhappy, you may want to have more than one :) There are convergence and recovery times which may have an extra cost for you. Sure, the failure of a single node might merely result in an increased latency somewhere. However, there can be a significant difference in recovering the whole node and just replacing/recovering one segment of data in it. That is especially the case with the caches, you do not want to randomly blow and refill them - it takes some time. it usually costs quite a bit of money too. So, how about making only the *failed* segment of data unavailable? It all depends on what you can or are willing to handle. Also what a crash (and reboot) would actually solve. It doesn't help if your boot partition is damaged and you force a reboot, converting a maybe half working system into a dead system. Not really relevant to my example where you would have dedicated disks for the data, but I think we agree on this - there are different use cases and it depends. If the boot partition is damaged, then it does indeed most likely mean that the system is not in the state where it can do its job. I would probably still just mount the file system to read-only mode and let it fail elsewhere, but it may as well panic; at this point it is non-functioning and requires manual intervention anyway. In my use case, a broken filesystem is usually a sign of an unnoticed hardware or software error and the best reaction to recover is to panic (and throw the data away, the machines have only temporary data). Continuing with a read-only filesystem doesn't do any good, because you have no means to find out wether the data you can read is complete or correct. Why not? It seems to be a question of how do you communicate the errors back to the applications or administrator. There are applications which gracefully handle EIOs exactly for this purpose. The fact that they are very rare does not mean they do not exist. :) Most clustered systems also handle complete outages better than a degraded mode. That's why you have things like STONITH. It really depends on the application or service; you often have to take design considerations for both though. -- Mindaugas
re: FFS: wrong superblock check ~ crash
Michael van Elst writes: bou...@antioche.eu.org (Manuel Bouyer) writes: definitively not. I want a panic. Definitely an administrative decision then. You could make it a mount option :) indeed. istr that solaris has onerror=panic|remount|unmount|...? option. .mrg.
Re: FFS: wrong superblock check ~ crash
Yes, you con configure that on a zpool: failmode YES wait | continue | panic This is relevant for SAN disks for instance, where you don´t necessarily want to crash the server when a disk becomes temporary unavailable. 2014-10-21 2:44 GMT+02:00 matthew green m...@eterna.com.au: Michael van Elst writes: bou...@antioche.eu.org (Manuel Bouyer) writes: definitively not. I want a panic. Definitely an administrative decision then. You could make it a mount option :) indeed. istr that solaris has onerror=panic|remount|unmount|...? option. .mrg.
Re: FFS: wrong superblock check ~ crash
rm...@netbsd.org (Mindaugas Rasiukevicius) writes: In my use case, a broken filesystem is usually a sign of an unnoticed hardware or software error and the best reaction to recover is to panic (and throw the data away, the machines have only temporary data). Continuing with a read-only filesystem doesn't do any good, because you have no means to find out wether the data you can read is complete or correct. Why not? It seems to be a question of how do you communicate the errors back to the applications or administrator. There are applications which gracefully handle EIOs exactly for this purpose. The fact that they are very rare does not mean they do not exist. :) See, that's what in my use case means. Most clustered systems also handle complete outages better than a degraded mode. That's why you have things like STONITH. It really depends on the application or service; you often have to take design considerations for both though. That was the point, there is no definitive solution. You, as an administrator, need to choose.
Re: FFS: wrong superblock check ~ crash
m...@eterna.com.au (matthew green) writes: Michael van Elst writes: bou...@antioche.eu.org (Manuel Bouyer) writes: definitively not. I want a panic. Definitely an administrative decision then. You could make it a mount option :) indeed. istr that solaris has onerror=panic|remount|unmount|...? option. Same for Linux. But before you can make it a mount option, the filesystem code must be able to handle the cases without panic :)
re: FFS: wrong superblock check ~ crash
Definitely an administrative decision then. You could make it a mount option :) indeed. istr that solaris has onerror=panic|remount|unmount|...? option. Same for Linux. But before you can make it a mount option, the filesystem code must be able to handle the cases without panic :) minor details... ;)