Re: CVS commit: src/sys/ufs/ufs
Le 27/02/2020 à 01:36, Simon Burge a écrit : > "Maxime Villard" wrote: > >> Module Name: src >> Committed By:maxv >> Date:Wed Feb 26 18:00:12 UTC 2020 >> >> Modified Files: >> >> src/sys/ufs/ufs: ufs_vnops.c >> >> Log Message: >> >> Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as >> ufs_makedirentry(). > > Is it cleaner to just call pool_cache_get() with PR_ZERO? > > Cheers, > Simon. In this specific case there is already a clean macro that gives the number of padding bytes, so using it is cleaner/faster than zeroing the whole buffer. Maxime
Re: CVS commit: src/sys/ufs/ufs
"Maxime Villard" wrote: > Module Name: src > Committed By: maxv > Date: Wed Feb 26 18:00:12 UTC 2020 > > Modified Files: > > src/sys/ufs/ufs: ufs_vnops.c > > Log Message: > > Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as > ufs_makedirentry(). Is it cleaner to just call pool_cache_get() with PR_ZERO? Cheers, Simon.
Re: CVS commit: src/sys/ufs/ufs
On Sun, Feb 24, 2019 at 10:13:40PM +0100, Tobias Nygren wrote: > On Sun, 24 Feb 2019 19:06:40 + > Michael van Elst wrote: > > > To generate a diff of this commit: > > cvs rdiff -u -r1.242 -r1.243 src/sys/ufs/ufs/ufs_vnops.c > > > + rawbuf -= dropend; > > I guess this should be rawbufmax, not rawbuf. Yes, already fixed. -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: CVS commit: src/sys/ufs/ufs
On Mon, Feb 25, 2019 at 06:07:23AM +, David Holland wrote: > that one doesn't set dropend correctly for small buffers, outsmarted > myself while writing it. Better change (still against 1.242) that makes the logic much simpler. Will test this overnight... Index: ufs_vnops.c === RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.239 diff -u -p -r1.239 ufs_vnops.c --- ufs_vnops.c 28 Oct 2017 00:37:13 - 1.239 +++ ufs_vnops.c 25 Feb 2019 06:58:52 - @@ -1233,7 +1233,7 @@ ufs_readdir(void *v) #endif /* caller's buffer */ struct uio *calleruio = ap->a_uio; - off_t startoffset, endoffset; + off_t startoffset; size_t callerbytes; off_t curoffset; /* dirent production buffer */ @@ -1244,8 +1244,8 @@ ufs_readdir(void *v) off_t *cookies; size_t numcookies, maxcookies; /* disk buffer */ - off_t physstart, physend; - size_t skipstart, dropend; + off_t physstart; + size_t skipstart; char*rawbuf; size_t rawbufmax, rawbytes; struct uio rawuio; @@ -1256,29 +1256,60 @@ ufs_readdir(void *v) KASSERT(VOP_ISLOCKED(vp)); - /* figure out where we want to read */ +#define UFS_READDIR_ARBITRARY_MAX (1024*1024) callerbytes = calleruio->uio_resid; - startoffset = calleruio->uio_offset; - endoffset = startoffset + callerbytes; - if (callerbytes < _DIRENT_MINSIZE(dirent)) { /* no room for even one struct dirent */ return EINVAL; } + if (callerbytes > UFS_READDIR_ARBITRARY_MAX) { + callerbytes = UFS_READDIR_ARBITRARY_MAX; + } - /* round start and end down to block boundaries */ + /* +* Figure out where to start reading. Round the start down to +* a block boundary: we need to start at the beginning of a +* block in order to read the directory correctly. +* +* skipstart is the number of bytes we need to read in +* (because we need to start at the beginning of a block) but +* not transfer to the user. +*/ + startoffset = calleruio->uio_offset; physstart = startoffset & ~(off_t)(ump->um_dirblksiz - 1); - physend = endoffset & ~(off_t)(ump->um_dirblksiz - 1); skipstart = startoffset - physstart; - dropend = endoffset - physend; - if (callerbytes - dropend < _DIRENT_MINSIZE(rawdp)) { - /* no room for even one struct direct */ - return EINVAL; - } + /* +* Now figure out how much to read. +* +* callerbytes tells us how much data we can send back to the +* user. Assume as a starting point that we want to read +* roughly the same volume of struct directs from disk as the +* volume of struct dirents we want to send to the user; this +* is not super accurate for large numbers of small names but +* will serve well enough. It's ok to underpopulate the user's +* buffer, and most directories are only a couple blocks +* anyway. +* +* However much we decide we want, stop on a block boundary. +* That way the copying code below doesn't need to worry about +* finding partial entries in the transfer buffer. +* +* Do this by rounding down (not up) by default as that will +* with luck avoid needing to scan the next block twice; but +* always read in at least one block. +*/ - /* how much to actually read */ - rawbufmax = callerbytes + skipstart - dropend; + /* Base buffer space for CALLERBYTES of new data */ + rawbufmax = callerbytes + skipstart; + + /* Round down to an integral number of blocks */ + rawbufmax &= ~(off_t)(ump->um_dirblksiz - 1); + + /* but always at least one block */ + if (rawbufmax == 0) { + rawbufmax = ump->um_dirblksiz; + } /* read it */ rawbuf = kmem_alloc(rawbufmax, KM_SLEEP); -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Mon, Feb 25, 2019 at 05:50:08AM +, David Holland wrote: > Furthermore, this: > > > + rawbuf -= dropend; > > is entirely wrong (it needs to be "rawbufmax") and without that bound > on rawbufmax the code is unsafe... I repaired this bit just now, so it's not an overt hazard any more. I still don't like this change all that much but whatever, I suppose... > Here's the fix I got bogged down trying to build and test, which also > adds a missing upper bound on callerbytes: that one doesn't set dropend correctly for small buffers, outsmarted myself while writing it. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Sun, Feb 24, 2019 at 07:51:24PM -0500, Christos Zoulas wrote: > Module Name: src > Committed By:christos > Date:Mon Feb 25 00:51:24 UTC 2019 > > Modified Files: > src/sys/ufs/ufs: ufs_vnops.c > > Log Message: > drop unused dropping this logic is wrong... -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Sun, 24 Feb 2019 19:06:40 + Michael van Elst wrote: > To generate a diff of this commit: > cvs rdiff -u -r1.242 -r1.243 src/sys/ufs/ufs/ufs_vnops.c > + rawbuf -= dropend; I guess this should be rawbufmax, not rawbuf.
Re: CVS commit: src/sys/ufs/ufs
On Sun, Feb 24, 2019 at 07:41:59PM +, m...@netbsd.org wrote: > something like the overflow is undefined behaviour, so it cannot > happen, so the branch checking that it happened is eliminated. *Signed* integer overflow is undefined behavior. *Unsigned* integer overflow is well defined. Joerg
Re: CVS commit: src/sys/ufs/ufs
On Sun, Feb 24, 2019 at 07:41:59PM +, m...@netbsd.org wrote: > something like the overflow is undefined behaviour, so it cannot > happen, so the branch checking that it happened is eliminated. Integer overflow is not undefined, but implemenation defined behaviour. Martin
Re: CVS commit: src/sys/ufs/ufs
On Sun, Feb 24, 2019 at 07:06:40PM +, Michael van Elst wrote: > While here, also check for arithmetic overflow. > + /* how much to actually read */ > + rawbufmax = callerbytes + skipstart; > + if (rawbufmax < callerbytes) > + return EINVAL; hmm, I"m under the impression that checking for overflow without upsetting the compiler is a delicate matter. something like the overflow is undefined behaviour, so it cannot happen, so the branch checking that it happened is eliminated.
Re: CVS commit: src/sys/ufs/ufs
Indeed rebooting with an updated kernel will give active NFS clients problems, but I am not sure we should realy care nor how we could possibly avoid this one time issue. We have changed encoding of filehandles before (at least once). it's a bad thing and worth mentioning in release notes. how about adding a version field so that it won't happen again? YAMAMOTO Takashi
Re: CVS commit: src/sys/ufs/ufs
Hi David Holland: I'm not entirely clear on how this structure is actually used, which is why I hadn't yet made this change myself after the issue came up. :-/ My observation in sys/fs/cd9660 is probably the initial cause for this change. PR kern/48799. I could produce a problem by exporting to NFS an ISO-9960 filesystem which gets 64-bit inodes due to high storage loacations of its directory records. NFS uses these structs to identify files in the original filesystem. cd9660 then uses the struct ifid member ifid_ino to fulfill this wish ... if no 32-bit bottleneck prevents it. So if the fix in ufs causes problems, could you please also have a look at the fix of kern/48799 ? Have a nice day :) Thomas
Re: CVS commit: src/sys/ufs/ufs
On Fri, May 16, 2014 at 12:12:00AM +0200, Joerg Sonnenberger wrote: I don't think it is a problem by itself as the file handle is opaque, but it is certainly wasteful to not put ufid_ino first. That is not possible, the first few bytes of the structure must match struct fid from fstypes.h. The structure is only used to construct opaque tokens for filehandles, and the encoding includes the size, so I see no problem. Indeed rebooting with an updated kernel will give active NFS clients problems, but I am not sure we should realy care nor how we could possibly avoid this one time issue. We have changed encoding of filehandles before (at least once). Martin
Re: CVS commit: src/sys/ufs/ufs
On Fri, May 16, 2014 at 03:54:44PM +, David Holland wrote: Indeed rebooting with an updated kernel will give active NFS clients problems, but I am not sure we should realy care nor how we could possibly avoid this one time issue. We have changed encoding of filehandles before (at least once). I don't think this is a problem, but maybe I'll put a note in UPDATING. Never mind that problem. Consider what happens if you reboot with a different CD in the drive! I once fixed a filesystem to use different faked inode numbers every time a filesystem was mounted. Without that NFS clients would write to the wrong file in the wrong FS. The 'impossible to get rid of' retries for hard mounts were something up with which I had to put. (A preposition is something you should not end a sentence with.) David -- David Laight: da...@l8s.co.uk
Re: CVS commit: src/sys/ufs/ufs
On Fri, May 16, 2014 at 06:48:06PM +, Martin Husemann wrote: The problem is that the tokens are memcmp'd, so if they include struct padding this may not work. All filesystems I've seen init the struct with a full memset to 0, so all padding fields should be initialized as well. Ok, but... However, we can swap the last two fields in the ufid case, if you prefer. ...may as well, I guess. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Wed, May 14, 2014 at 01:46:19PM +, Martin Husemann wrote: Modified Files: src/sys/ufs/ufs: inode.h Log Message: Make filehandles on UFS based filesystems use proper 64bit inodes. 32bit restriction noticed by Taylor R Campbell. I suspect this isn't going to work: u_int16_t ufid_len; /* Length of structure. */ u_int16_t ufid_pad; /* Force 32-bit alignment. */ ino_t ufid_ino; /* File number (ino). */ int32_t ufid_gen; /* Generation number. */ Probably it should be specifically sized, therefore uint64_t; also it needs padding for 64-bit alignment. I'm not entirely clear on how this structure is actually used, which is why I hadn't yet made this change myself after the issue came up. :-/ -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Thu, May 15, 2014 at 10:06:18PM +, David Holland wrote: On Wed, May 14, 2014 at 01:46:19PM +, Martin Husemann wrote: Modified Files: src/sys/ufs/ufs: inode.h Log Message: Make filehandles on UFS based filesystems use proper 64bit inodes. 32bit restriction noticed by Taylor R Campbell. I suspect this isn't going to work: I don't think it is a problem by itself as the file handle is opaque, but it is certainly wasteful to not put ufid_ino first. Joerg
Re: CVS commit: src/sys/ufs/ufs
On Fri, May 16, 2014 at 12:12:00AM +0200, Joerg Sonnenberger wrote: Modified Files: src/sys/ufs/ufs: inode.h Log Message: Make filehandles on UFS based filesystems use proper 64bit inodes. 32bit restriction noticed by Taylor R Campbell. I suspect this isn't going to work: I don't think it is a problem by itself as the file handle is opaque, but it is certainly wasteful to not put ufid_ino first. I've ascertained that it is in fact a problem: this thing is sent directly on the wire, and it gets memcmp'd, so having padding bytes in it is going to cause trouble. Also, if you reboot and update the kernel to add this change all old file handles will quit working; this probably doesn't matter though. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Sun, Jan 29, 2012 at 08:49:02AM +, Izumi Tsutsui wrote: Modified Files: src/sys/ufs/ufs: ufs_vfsops.c Log Message: Fix errors in !defined(QUOTA) !defined(QUOTA2) case. Whoops, sorry... -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Tue, Oct 11, 2011 at 04:40:02AM +, David Holland wrote: On Sun, Oct 09, 2011 at 09:15:34PM +, Chuck Silvers wrote: Modified Files: src/sys/ufs/ufs: extattr.h Log Message: add forward declarations for the VOP args structures so that fstat can include this file. Do we really want fstat using fs-specific headers and interfaces? (I know, cleaning it up is probably nontrivial) sure, it'd be better if fstat used sysctl instead of libkvm to extract all this info from the kernel, but as you say, non-trivial. -Chuck
Re: CVS commit: src/sys/ufs/ufs
On Sun, Oct 09, 2011 at 09:15:34PM +, Chuck Silvers wrote: Modified Files: src/sys/ufs/ufs: extattr.h Log Message: add forward declarations for the VOP args structures so that fstat can include this file. Do we really want fstat using fs-specific headers and interfaces? (I know, cleaning it up is probably nontrivial) -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Sun, Jul 17, 2011 at 10:02:27PM +, David A. Holland wrote: Modified Files: src/sys/ufs/ufs: ufs_vnops.c Log Message: At the end of ufs_rmdir, don't use a dangling vnode pointer to call fstrans_done. Ok hannken@ Well, for the record it turns out I misread his mail and maybe this isn't ok... will fix. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Tue, 25 May 2010, Antti Kantee wrote: What would I have to do from userland to tickle this bug? You need to unlink and create a file after the first namei in sys_rename but before VOP_RENAME runs, i.e. trigger a race condition. Thanks. --apb (Alan Barrett)
Re: CVS commit: src/sys/ufs/ufs
On Tue, 25 May 2010, Antti Kantee wrote: Modified Files: src/sys/ufs/ufs: ufs_wapbl.c Log Message: Add a comment describing an observed boom-crash-burn problem in the code. Fixing it will require a full tank of gas, half a pack of cigarettes, sunglasses, darkness, and most importantly: someone else. What would I have to do from userland to tickle this bug? --apb (Alan Barrett)
Re: CVS commit: src/sys/ufs/ufs
On Tue May 25 2010 at 15:55:35 +0200, Alan Barrett wrote: On Tue, 25 May 2010, Antti Kantee wrote: Modified Files: src/sys/ufs/ufs: ufs_wapbl.c Log Message: Add a comment describing an observed boom-crash-burn problem in the code. Fixing it will require a full tank of gas, half a pack of cigarettes, sunglasses, darkness, and most importantly: someone else. What would I have to do from userland to tickle this bug? You need to unlink and create a file after the first namei in sys_rename but before VOP_RENAME runs, i.e. trigger a race condition. Running tests/fs/ffs/t_renamerace a few times should suffice. It uses rump, so your host is safe. If you have a uniprocessor host, set RUMP_NCPU to 2 to have both threads run in the rump kernel in parallel (well, they run virtually parallel on a uniprocessor system, but YKWIM). This makes the race more likely to trigger.
Re: CVS commit: src/sys/ufs/ufs
On Tue, May 25, 2010 at 11:02:07AM +, Antti Kantee wrote: Add a comment describing an observed boom-crash-burn problem in the code. Fixing it will require a full tank of gas, half a pack of cigarettes, sunglasses, darkness, and most importantly: someone else. Thanks for the merge conflict :-p -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote: Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment? It's becoming clear that this is something I'm going to need to wade into. Much as I've been trying to avoid it. :-/ There's a (perfectly natural) tendency to try to fix synchronization problems by adding states -- extra flags, more locks, moving things to the background, etc. -- but in general the way to fix synchronization problems so they *stay* fixed is to remove states. For example, from what I've seen so far I'm pretty sure XLOCK ought to go away. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/ufs/ufs
On Sep,Tuesday 22 2009, at 9:42 PM, Antti Kantee wrote: On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote: that's not an issue with the reference count. It's an issue with vclean() calling VOP_RECLAIM() even if the refcount is greater than 1, and vrelel() calling vclean() even if the refcount is greater than 1, when the file has been unlink(2)ed. There's a comment about this in vrelel(), look for variable recycle. Ah, ic, probably would've been easier if I read the PR first ;) So basically someone can vget the vnode (via fhtovp, since it's gone from the directory namespace) between VOP_INACTIVE and clean. Your fix doesn't really fix this problem, since depending on timings the inode might still be recycled after you check it's valid. Hmm, ok, so things which might happen: 1: VOP_REMOVE, vnode is locked, vrele is called 2: fhtovp before inode is reclaimed, blocks on vn_lock 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode 2: gets lock, does check that it's still the same inode 1: reclaims vnode 2: boom Since vget takes the interlock, a dirty cheap trick might be to check that the reference count is still one before trying to clean the vnode in vrelel(), otherwise punting and letting the next release-to-0 reclaim it? Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment? Sorry I haven't read this thread until today. I think that it would be good to test my vreclaim patched kernel it might help to resolve your problem. But I don't think it is a solution see above. Regards Adam.
Re: CVS commit: src/sys/ufs/ufs
On Wed, Sep 23, 2009 at 06:09:00PM +0200, Adam Hamsik wrote: Sorry I haven't read this thread until today. I think that it would be good to test my vreclaim patched kernel it might help to resolve your problem. But I don't think it is a solution see above. Your patch won't help. It only changes the behavior or getnewvnode(), and in this problem getnewvnode() isn't involved at all. Also I can't see how your patch would help, a vnode for an inode which has been deleted still needs to be invalidated at unlink time, and can't be deffered to another thread. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Wed, Sep 23, 2009 at 03:08:14PM +, David Holland wrote: It's becoming clear that this is something I'm going to need to wade into. Much as I've been trying to avoid it. :-/ There's a (perfectly natural) tendency to try to fix synchronization problems by adding states -- extra flags, more locks, moving things to the background, etc. -- but in general the way to fix synchronization problems so they *stay* fixed is to remove states. For example, from what I've seen so far I'm pretty sure XLOCK ought to go away. That's possible, at last in its current form as it fails to protect vget(). -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because vget() can sleep. After vget returns, check that vp is still connected with ip, and that ip still points to the inode we want. This fix the NULL pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. Um, hold the phone. The whole point of vget() is to provide race-free access to the weak vnode reference held by the file system. Are you saying this does not hold anymore? It depends on what you mean with race-free. If you mean that the vnode returned by vget() can't be recygled, I think this is true. If you mean that vget() can't return a clean vnode then this is false: vget() can sleep in vn_lock(), and it releases the v_interlock mutex before sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even if v_usecount is 1. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote: On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because vget() can sleep. After vget returns, check that vp is still connected with ip, and that ip still points to the inode we want. This fix the NULL pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. Um, hold the phone. The whole point of vget() is to provide race-free access to the weak vnode reference held by the file system. Are you saying this does not hold anymore? It depends on what you mean with race-free. If you mean that the vnode returned by vget() can't be recygled, I think this is true. If you mean that vget() can't return a clean vnode then this is false: vget() can sleep in vn_lock(), and it releases the v_interlock mutex before sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even if v_usecount is 1. What is the practical difference of cleaned and recycled for the file system driver? If there is a race in vfs and XLOCK is not used properly, I think that should be investigated and fixed instead of patching file systems here and there.
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 09:23:05PM +0300, Antti Kantee wrote: On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote: On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote: In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because vget() can sleep. After vget returns, check that vp is still connected with ip, and that ip still points to the inode we want. This fix the NULL pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. Um, hold the phone. The whole point of vget() is to provide race-free access to the weak vnode reference held by the file system. Are you saying this does not hold anymore? It depends on what you mean with race-free. If you mean that the vnode returned by vget() can't be recygled, I think this is true. If you mean that vget() can't return a clean vnode then this is false: vget() can sleep in vn_lock(), and it releases the v_interlock mutex before sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even if v_usecount is 1. What is the practical difference of cleaned and recycled for the file system driver? From what I understand the vnode is not on the free list yet so it can't be reused for something else. If there is a race in vfs and XLOCK is not used properly, I think that should be investigated and fixed instead of patching file systems here and there. I don't know how XLOCK is supposed to be used (and I even less know how to change things without creating deadlocks). As I see it XLOCK is set while the vnode is being cleaned, not before or after cleaning it, so XLOCK is not going to avoid this race anyway. I asked for help on tech-kern about this but didn't get any reply ... -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote: that's not an issue with the reference count. It's an issue with vclean() calling VOP_RECLAIM() even if the refcount is greater than 1, and vrelel() calling vclean() even if the refcount is greater than 1, when the file has been unlink(2)ed. There's a comment about this in vrelel(), look for variable recycle. Ah, ic, probably would've been easier if I read the PR first ;) So basically someone can vget the vnode (via fhtovp, since it's gone from the directory namespace) between VOP_INACTIVE and clean. Your fix doesn't really fix this problem, since depending on timings the inode might still be recycled after you check it's valid. Hmm, ok, so things which might happen: 1: VOP_REMOVE, vnode is locked, vrele is called 2: fhtovp before inode is reclaimed, blocks on vn_lock 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode 2: gets lock, does check that it's still the same inode 1: reclaims vnode 2: boom Since vget takes the interlock, a dirty cheap trick might be to check that the reference count is still one before trying to clean the vnode in vrelel(), otherwise punting and letting the next release-to-0 reclaim it? Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment?
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote: On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote: that's not an issue with the reference count. It's an issue with vclean() calling VOP_RECLAIM() even if the refcount is greater than 1, and vrelel() calling vclean() even if the refcount is greater than 1, when the file has been unlink(2)ed. There's a comment about this in vrelel(), look for variable recycle. Ah, ic, probably would've been easier if I read the PR first ;) So basically someone can vget the vnode (via fhtovp, since it's gone from the directory namespace) between VOP_INACTIVE and clean. Your fix doesn't really fix this problem, since depending on timings the inode might still be recycled after you check it's valid. I don't think so because at this point the vnode is locked. I think the race window is between vn_lock() releasing the interlock and getting the vn_lock. After that the reference count is high enouth to prevent vrelel() trying to clean it (vtryrele() at the start of vrelel will make it return, and we hold the interlock at this point). Hmm, ok, so things which might happen: 1: VOP_REMOVE, vnode is locked, vrele is called 2: fhtovp before inode is reclaimed, blocks on vn_lock It would fist block on the interlock at this point, I guess. 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode 2: gets lock, does check that it's still the same inode 1: reclaims vnode 2: boom Since vget takes the interlock, a dirty cheap trick might be to check that the reference count is still one before trying to clean the vnode in vrelel(), otherwise punting and letting the next release-to-0 reclaim it? So basically ignoring what VOP_INACTIVE() says. I think this would need another flag so that new vget() against this vnode can drop it early. Otherwise we could have a livelock with the NFS server always getting references to the vnode, preventing it from being recycled and blocking other threads waiting on the inode to reuse it. Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment? that would be nice :) -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote: [explanations] This is starting to sound sensible now. You obviously have invested quite a bit of thought in investigating the problem. Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment? that would be nice :) You don't like my comments? ;)
Re: CVS commit: src/sys/ufs/ufs
On Tue, Sep 22, 2009 at 11:26:21PM +0300, Antti Kantee wrote: On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote: [explanations] This is starting to sound sensible now. You obviously have invested quite a bit of thought in investigating the problem. Yes, it was annoying enough :) Blah, I didn't even want to think about this migrane-inducer now. Maybe people who have recently worked on vnode reclaiming could instead be the ones to comment? that would be nice :) You don't like my comments? ;) Ho I do, and you certainly know more than I do in this area ... :) -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: CVS commit: src/sys/ufs/ufs
On Sun Sep 20 2009 at 14:00:24 +, Manuel Bouyer wrote: Module Name: src Committed By: bouyer Date: Sun Sep 20 14:00:24 UTC 2009 Modified Files: src/sys/ufs/ufs: ufs_ihash.c Log Message: PR kern/41147: race between nfsd and local rm Note that the race also exists between 2 nfs client, one of them doing the rm. In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because vget() can sleep. After vget returns, check that vp is still connected with ip, and that ip still points to the inode we want. This fix the NULL pointer dereference in ufs_fhtovp() I've been seeing on a NFS server. Um, hold the phone. The whole point of vget() is to provide race-free access to the weak vnode reference held by the file system. Are you saying this does not hold anymore?