Re: CVS commit: src/sys/ufs/ufs

2020-03-11 Thread Maxime Villard
Le 27/02/2020 à 01:36, Simon Burge a écrit :
> "Maxime Villard" wrote:
> 
>> Module Name: src
>> Committed By:maxv
>> Date:Wed Feb 26 18:00:12 UTC 2020
>>
>> Modified Files:
>>
>>  src/sys/ufs/ufs: ufs_vnops.c
>>
>> Log Message:
>>
>> Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as
>> ufs_makedirentry().
> 
> Is it cleaner to just call pool_cache_get() with PR_ZERO?
> 
> Cheers,
> Simon.

In this specific case there is already a clean macro that gives the number of
padding bytes, so using it is cleaner/faster than zeroing the whole buffer.

Maxime


Re: CVS commit: src/sys/ufs/ufs

2020-02-26 Thread Simon Burge
"Maxime Villard" wrote:

> Module Name:  src
> Committed By: maxv
> Date: Wed Feb 26 18:00:12 UTC 2020
>
> Modified Files:
>
>   src/sys/ufs/ufs: ufs_vnops.c
>
> Log Message:
>
> Zero out the padding in 'd_namlen', to prevent info leaks. Same logic as
> ufs_makedirentry().

Is it cleaner to just call pool_cache_get() with PR_ZERO?

Cheers,
Simon.


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread Michael van Elst
On Sun, Feb 24, 2019 at 10:13:40PM +0100, Tobias Nygren wrote:
> On Sun, 24 Feb 2019 19:06:40 +
> Michael van Elst  wrote:
> 
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.242 -r1.243 src/sys/ufs/ufs/ufs_vnops.c
> 
> > +   rawbuf -= dropend;
> 
> I guess this should be rawbufmax, not rawbuf.

Yes, already fixed.

-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread David Holland
On Mon, Feb 25, 2019 at 06:07:23AM +, David Holland wrote:
 > that one doesn't set dropend correctly for small buffers, outsmarted
 > myself while writing it.

Better change (still against 1.242) that makes the logic much
simpler. Will test this overnight...

Index: ufs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.239
diff -u -p -r1.239 ufs_vnops.c
--- ufs_vnops.c 28 Oct 2017 00:37:13 -  1.239
+++ ufs_vnops.c 25 Feb 2019 06:58:52 -
@@ -1233,7 +1233,7 @@ ufs_readdir(void *v)
 #endif
/* caller's buffer */
struct uio  *calleruio = ap->a_uio;
-   off_t   startoffset, endoffset;
+   off_t   startoffset;
size_t  callerbytes;
off_t   curoffset;
/* dirent production buffer */
@@ -1244,8 +1244,8 @@ ufs_readdir(void *v)
off_t   *cookies;
size_t  numcookies, maxcookies;
/* disk buffer */
-   off_t   physstart, physend;
-   size_t  skipstart, dropend;
+   off_t   physstart;
+   size_t  skipstart;
char*rawbuf;
size_t  rawbufmax, rawbytes;
struct uio  rawuio;
@@ -1256,29 +1256,60 @@ ufs_readdir(void *v)
 
KASSERT(VOP_ISLOCKED(vp));
 
-   /* figure out where we want to read */
+#define UFS_READDIR_ARBITRARY_MAX (1024*1024)
callerbytes = calleruio->uio_resid;
-   startoffset = calleruio->uio_offset;
-   endoffset = startoffset + callerbytes;
-
if (callerbytes < _DIRENT_MINSIZE(dirent)) {
/* no room for even one struct dirent */
return EINVAL;
}
+   if (callerbytes > UFS_READDIR_ARBITRARY_MAX) {
+   callerbytes = UFS_READDIR_ARBITRARY_MAX;
+   }
 
-   /* round start and end down to block boundaries */
+   /*
+* Figure out where to start reading. Round the start down to
+* a block boundary: we need to start at the beginning of a
+* block in order to read the directory correctly.
+*
+* skipstart is the number of bytes we need to read in
+* (because we need to start at the beginning of a block) but
+* not transfer to the user.
+*/
+   startoffset = calleruio->uio_offset;
physstart = startoffset & ~(off_t)(ump->um_dirblksiz - 1);
-   physend = endoffset & ~(off_t)(ump->um_dirblksiz - 1);
skipstart = startoffset - physstart;
-   dropend = endoffset - physend;
 
-   if (callerbytes - dropend < _DIRENT_MINSIZE(rawdp)) {
-   /* no room for even one struct direct */
-   return EINVAL;
-   }
+   /*
+* Now figure out how much to read.
+*
+* callerbytes tells us how much data we can send back to the
+* user. Assume as a starting point that we want to read
+* roughly the same volume of struct directs from disk as the
+* volume of struct dirents we want to send to the user; this
+* is not super accurate for large numbers of small names but
+* will serve well enough. It's ok to underpopulate the user's
+* buffer, and most directories are only a couple blocks
+* anyway.
+*
+* However much we decide we want, stop on a block boundary.
+* That way the copying code below doesn't need to worry about
+* finding partial entries in the transfer buffer.
+*
+* Do this by rounding down (not up) by default as that will
+* with luck avoid needing to scan the next block twice; but
+* always read in at least one block.
+*/
 
-   /* how much to actually read */
-   rawbufmax = callerbytes + skipstart - dropend;
+   /* Base buffer space for CALLERBYTES of new data */
+   rawbufmax = callerbytes + skipstart;
+
+   /* Round down to an integral number of blocks */
+   rawbufmax &= ~(off_t)(ump->um_dirblksiz - 1);
+
+   /* but always at least one block */
+   if (rawbufmax == 0) {
+   rawbufmax = ump->um_dirblksiz;
+   }
 
/* read it */
rawbuf = kmem_alloc(rawbufmax, KM_SLEEP);


-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread David Holland
On Mon, Feb 25, 2019 at 05:50:08AM +, David Holland wrote:
 > Furthermore, this:
 > 
 >  > +   rawbuf -= dropend;
 > 
 > is entirely wrong (it needs to be "rawbufmax") and without that bound
 > on rawbufmax the code is unsafe...

I repaired this bit just now, so it's not an overt hazard any more.

I still don't like this change all that much but whatever, I suppose...

 > Here's the fix I got bogged down trying to build and test, which also
 > adds a missing upper bound on callerbytes:

that one doesn't set dropend correctly for small buffers, outsmarted
myself while writing it.

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread David Holland
On Sun, Feb 24, 2019 at 07:51:24PM -0500, Christos Zoulas wrote:
 > Module Name: src
 > Committed By:christos
 > Date:Mon Feb 25 00:51:24 UTC 2019
 > 
 > Modified Files:
 >  src/sys/ufs/ufs: ufs_vnops.c
 > 
 > Log Message:
 > drop unused

dropping this logic is wrong...

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread Tobias Nygren
On Sun, 24 Feb 2019 19:06:40 +
Michael van Elst  wrote:

> To generate a diff of this commit:
> cvs rdiff -u -r1.242 -r1.243 src/sys/ufs/ufs/ufs_vnops.c

> +   rawbuf -= dropend;

I guess this should be rawbufmax, not rawbuf.


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread Joerg Sonnenberger
On Sun, Feb 24, 2019 at 07:41:59PM +, m...@netbsd.org wrote:
> something like the overflow is undefined behaviour, so it cannot
> happen, so the branch checking that it happened is eliminated.

*Signed* integer overflow is undefined behavior. *Unsigned* integer
overflow is well defined.

Joerg


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread Martin Husemann
On Sun, Feb 24, 2019 at 07:41:59PM +, m...@netbsd.org wrote:
> something like the overflow is undefined behaviour, so it cannot
> happen, so the branch checking that it happened is eliminated.

Integer overflow is not undefined, but implemenation defined behaviour.

Martin


Re: CVS commit: src/sys/ufs/ufs

2019-02-24 Thread maya
On Sun, Feb 24, 2019 at 07:06:40PM +, Michael van Elst wrote:
> While here, also check for arithmetic overflow.


> + /* how much to actually read */
> + rawbufmax = callerbytes + skipstart;
> + if (rawbufmax < callerbytes)
> + return EINVAL;

hmm, I"m under the impression that checking for overflow without
upsetting the compiler is a delicate matter.

something like the overflow is undefined behaviour, so it cannot
happen, so the branch checking that it happened is eliminated.


Re: CVS commit: src/sys/ufs/ufs

2014-05-22 Thread YAMAMOTO Takashi
 Indeed rebooting with an updated kernel will give active NFS clients
 problems, but I am not sure we should realy care nor how we could
 possibly avoid this one time issue. We have changed encoding of
 filehandles before (at least once).

it's a bad thing and worth mentioning in release notes.

how about adding a version field so that it won't happen again?

YAMAMOTO Takashi


Re: CVS commit: src/sys/ufs/ufs

2014-05-16 Thread Thomas Schmitt
Hi

David Holland:
 I'm not entirely clear on how this structure is actually used, which
 is why I hadn't yet made this change myself after the issue came up. :-/

My observation in sys/fs/cd9660 is probably the initial cause for
this change. PR kern/48799.

I could produce a problem by exporting to NFS an ISO-9960 filesystem
which gets 64-bit inodes due to high storage loacations of its
directory records. NFS uses these structs to identify files in the
original filesystem. cd9660 then uses the struct ifid member ifid_ino
to fulfill this wish ... if no 32-bit bottleneck prevents it.

So if the fix in ufs causes problems, could you please also have
a look at the fix of kern/48799 ?


Have a nice day :)

Thomas



Re: CVS commit: src/sys/ufs/ufs

2014-05-16 Thread Martin Husemann
On Fri, May 16, 2014 at 12:12:00AM +0200, Joerg Sonnenberger wrote:
 I don't think it is a problem by itself as the file handle is opaque,
 but it is certainly wasteful to not put ufid_ino first.

That is not possible, the first few bytes of the structure must match
struct fid from fstypes.h.

The structure is only used to construct opaque tokens for filehandles,
and the encoding includes the size, so I see no problem.

Indeed rebooting with an updated kernel will give active NFS clients
problems, but I am not sure we should realy care nor how we could
possibly avoid this one time issue. We have changed encoding of
filehandles before (at least once).

Martin


Re: CVS commit: src/sys/ufs/ufs

2014-05-16 Thread David Laight
On Fri, May 16, 2014 at 03:54:44PM +, David Holland wrote:
 
   Indeed rebooting with an updated kernel will give active NFS clients
   problems, but I am not sure we should realy care nor how we could
   possibly avoid this one time issue. We have changed encoding of
   filehandles before (at least once).
 
 I don't think this is a problem, but maybe I'll put a note in UPDATING.

Never mind that problem.
Consider what happens if you reboot with a different CD in the drive!

I once fixed a filesystem to use different faked inode numbers every
time a filesystem was mounted.
Without that NFS clients would write to the wrong file in the wrong FS.

The 'impossible to get rid of' retries for hard mounts were something
up with which I had to put. (A preposition is something you should not
end a sentence with.)

David

-- 
David Laight: da...@l8s.co.uk


Re: CVS commit: src/sys/ufs/ufs

2014-05-16 Thread David Holland
On Fri, May 16, 2014 at 06:48:06PM +, Martin Husemann wrote:
   The problem is that the tokens are memcmp'd, so if they include struct
   padding this may not work.
  
  All filesystems I've seen init the struct with a full memset to 0, so
  all padding fields should be initialized as well.

Ok, but...

  However, we can swap the last two fields in the ufid case, if you prefer.

...may as well, I guess.

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2014-05-15 Thread David Holland
On Wed, May 14, 2014 at 01:46:19PM +, Martin Husemann wrote:
  Modified Files:
   src/sys/ufs/ufs: inode.h
  
  Log Message:
  Make filehandles on UFS based filesystems use proper 64bit inodes.
  32bit restriction noticed by Taylor R Campbell.

I suspect this isn't going to work:

u_int16_t ufid_len; /* Length of structure. */
u_int16_t ufid_pad; /* Force 32-bit alignment. */
ino_t ufid_ino; /* File number (ino). */
int32_t   ufid_gen; /* Generation number. */

Probably it should be specifically sized, therefore uint64_t; also it
needs padding for 64-bit alignment.

I'm not entirely clear on how this structure is actually used, which
is why I hadn't yet made this change myself after the issue came up. :-/

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2014-05-15 Thread Joerg Sonnenberger
On Thu, May 15, 2014 at 10:06:18PM +, David Holland wrote:
 On Wed, May 14, 2014 at 01:46:19PM +, Martin Husemann wrote:
   Modified Files:
  src/sys/ufs/ufs: inode.h
   
   Log Message:
   Make filehandles on UFS based filesystems use proper 64bit inodes.
   32bit restriction noticed by Taylor R Campbell.
 
 I suspect this isn't going to work:

I don't think it is a problem by itself as the file handle is opaque,
but it is certainly wasteful to not put ufid_ino first.

Joerg


Re: CVS commit: src/sys/ufs/ufs

2014-05-15 Thread David Holland
On Fri, May 16, 2014 at 12:12:00AM +0200, Joerg Sonnenberger wrote:
 Modified Files:
 src/sys/ufs/ufs: inode.h
 
 Log Message:
 Make filehandles on UFS based filesystems use proper 64bit inodes.
 32bit restriction noticed by Taylor R Campbell.
   
   I suspect this isn't going to work:
  
  I don't think it is a problem by itself as the file handle is opaque,
  but it is certainly wasteful to not put ufid_ino first.

I've ascertained that it is in fact a problem: this thing is sent
directly on the wire, and it gets memcmp'd, so having padding bytes in
it is going to cause trouble.

Also, if you reboot and update the kernel to add this change all old
file handles will quit working; this probably doesn't matter though.

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2012-01-29 Thread David Holland
On Sun, Jan 29, 2012 at 08:49:02AM +, Izumi Tsutsui wrote:
  Modified Files:
   src/sys/ufs/ufs: ufs_vfsops.c
  
  Log Message:
  Fix errors in !defined(QUOTA)  !defined(QUOTA2) case.

Whoops, sorry...
-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2011-10-11 Thread Chuck Silvers
On Tue, Oct 11, 2011 at 04:40:02AM +, David Holland wrote:
 On Sun, Oct 09, 2011 at 09:15:34PM +, Chuck Silvers wrote:
   Modified Files:
  src/sys/ufs/ufs: extattr.h
   
   Log Message:
   add forward declarations for the VOP args structures
   so that fstat can include this file.
 
 Do we really want fstat using fs-specific headers and interfaces?
 
 (I know, cleaning it up is probably nontrivial)

sure, it'd be better if fstat used sysctl instead of libkvm to
extract all this info from the kernel, but as you say, non-trivial.

-Chuck


Re: CVS commit: src/sys/ufs/ufs

2011-10-10 Thread David Holland
On Sun, Oct 09, 2011 at 09:15:34PM +, Chuck Silvers wrote:
  Modified Files:
   src/sys/ufs/ufs: extattr.h
  
  Log Message:
  add forward declarations for the VOP args structures
  so that fstat can include this file.

Do we really want fstat using fs-specific headers and interfaces?

(I know, cleaning it up is probably nontrivial)

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2011-07-17 Thread David Holland
On Sun, Jul 17, 2011 at 10:02:27PM +, David A. Holland wrote:
  Modified Files:
   src/sys/ufs/ufs: ufs_vnops.c
  
  Log Message:
  At the end of ufs_rmdir, don't use a dangling vnode pointer to call
  fstrans_done. Ok hannken@

Well, for the record it turns out I misread his mail and maybe this
isn't ok... will fix.

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2010-05-27 Thread Alan Barrett
On Tue, 25 May 2010, Antti Kantee wrote:
  What would I have to do from userland to tickle this bug?
 
 You need to unlink and create a file after the first namei in sys_rename
 but before VOP_RENAME runs, i.e. trigger a race condition.

Thanks.

--apb (Alan Barrett)


Re: CVS commit: src/sys/ufs/ufs

2010-05-25 Thread Alan Barrett
On Tue, 25 May 2010, Antti Kantee wrote:
 Modified Files:
   src/sys/ufs/ufs: ufs_wapbl.c
 
 Log Message:
 Add a comment describing an observed boom-crash-burn problem in
 the code.  Fixing it will require a full tank of gas, half a pack
 of cigarettes, sunglasses, darkness, and most importantly:
 someone else.

What would I have to do from userland to tickle this bug?

--apb (Alan Barrett)


Re: CVS commit: src/sys/ufs/ufs

2010-05-25 Thread Antti Kantee
On Tue May 25 2010 at 15:55:35 +0200, Alan Barrett wrote:
 On Tue, 25 May 2010, Antti Kantee wrote:
  Modified Files:
  src/sys/ufs/ufs: ufs_wapbl.c
  
  Log Message:
  Add a comment describing an observed boom-crash-burn problem in
  the code.  Fixing it will require a full tank of gas, half a pack
  of cigarettes, sunglasses, darkness, and most importantly:
  someone else.
 
 What would I have to do from userland to tickle this bug?

You need to unlink and create a file after the first namei in sys_rename
but before VOP_RENAME runs, i.e. trigger a race condition.

Running tests/fs/ffs/t_renamerace a few times should suffice.  It uses
rump, so your host is safe.  If you have a uniprocessor host, set
RUMP_NCPU to 2 to have both threads run in the rump kernel in parallel
(well, they run virtually parallel on a uniprocessor system, but YKWIM).
This makes the race more likely to trigger.


Re: CVS commit: src/sys/ufs/ufs

2010-05-25 Thread David Holland
On Tue, May 25, 2010 at 11:02:07AM +, Antti Kantee wrote:
  Add a comment describing an observed boom-crash-burn problem in
  the code.  Fixing it will require a full tank of gas, half a pack
  of cigarettes, sunglasses, darkness, and most importantly:
  someone else.

Thanks for the merge conflict :-p

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2009-09-23 Thread David Holland
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote:
  Blah, I didn't even want to think about this migrane-inducer now.
  Maybe people who have recently worked on vnode reclaiming could instead
  be the ones to comment?

It's becoming clear that this is something I'm going to need to wade
into. Much as I've been trying to avoid it. :-/

There's a (perfectly natural) tendency to try to fix synchronization
problems by adding states -- extra flags, more locks, moving things to
the background, etc. -- but in general the way to fix synchronization
problems so they *stay* fixed is to remove states. For example, from
what I've seen so far I'm pretty sure XLOCK ought to go away.

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/ufs/ufs

2009-09-23 Thread Adam Hamsik


On Sep,Tuesday 22 2009, at 9:42 PM, Antti Kantee wrote:


On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote:
that's not an issue with the reference count. It's an issue with  
vclean()

calling VOP_RECLAIM() even if the refcount is greater than 1, and
vrelel() calling vclean() even if the refcount is greater than 1,
when the file has been unlink(2)ed. There's a comment about this in
vrelel(), look for variable recycle.


Ah, ic, probably would've been easier if I read the PR first ;)

So basically someone can vget the vnode (via fhtovp, since it's gone
from the directory namespace) between VOP_INACTIVE and clean.

Your fix doesn't really fix this problem, since depending on timings
the inode might still be recycled after you check it's valid.

Hmm, ok, so things which might happen:

1: VOP_REMOVE, vnode is locked, vrele is called
2: fhtovp before inode is reclaimed, blocks on vn_lock
1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode
2: gets lock, does check that it's still the same inode
1: reclaims vnode
2: boom

Since vget takes the interlock, a dirty  cheap trick might be to  
check
that the reference count is still one before trying to clean the  
vnode in
vrelel(), otherwise punting and letting the next release-to-0  
reclaim it?


Blah, I didn't even want to think about this migrane-inducer now.
Maybe people who have recently worked on vnode reclaiming could  
instead

be the ones to comment?


Sorry I haven't read this thread until today. I think that it would be  
good to test my vreclaim patched kernel it might help to resolve your  
problem. But I don't think it is a solution see above.



Regards

Adam.



Re: CVS commit: src/sys/ufs/ufs

2009-09-23 Thread Manuel Bouyer
On Wed, Sep 23, 2009 at 06:09:00PM +0200, Adam Hamsik wrote:
 Sorry I haven't read this thread until today. I think that it would be  
 good to test my vreclaim patched kernel it might help to resolve your  
 problem. But I don't think it is a solution see above.

Your patch won't help. It only changes the behavior or getnewvnode(),
and in this problem getnewvnode() isn't involved at all.
Also I can't see how your patch would help, a vnode for an inode which has
been deleted still needs to be invalidated at unlink time, and can't
be deffered to another thread.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-23 Thread Manuel Bouyer
On Wed, Sep 23, 2009 at 03:08:14PM +, David Holland wrote:
 It's becoming clear that this is something I'm going to need to wade
 into. Much as I've been trying to avoid it. :-/
 
 There's a (perfectly natural) tendency to try to fix synchronization
 problems by adding states -- extra flags, more locks, moving things to
 the background, etc. -- but in general the way to fix synchronization
 problems so they *stay* fixed is to remove states. For example, from
 what I've seen so far I'm pretty sure XLOCK ought to go away.

That's possible, at last in its current form as it fails to protect
vget().

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
  In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
  vget() can sleep. After vget returns, check that vp is still connected with
  ip, and that ip still points to the inode we want. This fix the NULL
  pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
 
 Um, hold the phone.  The whole point of vget() is to provide race-free
 access to the weak vnode reference held by the file system.  Are you
 saying this does not hold anymore?

It depends on what you mean with race-free. If you mean that the
vnode returned by vget() can't be recygled, I think this is true.
If you mean that vget() can't return a clean vnode then this is false:
vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
if v_usecount is  1.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote:
 On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
   In ufs_ihashget(), vget() can return a vnode that has been vclean'ed 
   because
   vget() can sleep. After vget returns, check that vp is still connected 
   with
   ip, and that ip still points to the inode we want. This fix the NULL
   pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
  
  Um, hold the phone.  The whole point of vget() is to provide race-free
  access to the weak vnode reference held by the file system.  Are you
  saying this does not hold anymore?
 
 It depends on what you mean with race-free. If you mean that the
 vnode returned by vget() can't be recygled, I think this is true.
 If you mean that vget() can't return a clean vnode then this is false:
 vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
 sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
 if v_usecount is  1.

What is the practical difference of cleaned and recycled for the
file system driver?

If there is a race in vfs and XLOCK is not used properly, I think that
should be investigated and fixed instead of patching file systems here
and there.


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 09:23:05PM +0300, Antti Kantee wrote:
 On Tue Sep 22 2009 at 20:06:40 +0200, Manuel Bouyer wrote:
  On Sun, Sep 20, 2009 at 08:23:38PM +0300, Antti Kantee wrote:
In ufs_ihashget(), vget() can return a vnode that has been vclean'ed 
because
vget() can sleep. After vget returns, check that vp is still connected 
with
ip, and that ip still points to the inode we want. This fix the NULL
pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
   
   Um, hold the phone.  The whole point of vget() is to provide race-free
   access to the weak vnode reference held by the file system.  Are you
   saying this does not hold anymore?
  
  It depends on what you mean with race-free. If you mean that the
  vnode returned by vget() can't be recygled, I think this is true.
  If you mean that vget() can't return a clean vnode then this is false:
  vget() can sleep in vn_lock(), and it releases the v_interlock mutex before
  sleeping. While sleeping vclean() can VOP_RECLAIM() the vnode, even
  if v_usecount is  1.
 
 What is the practical difference of cleaned and recycled for the
 file system driver?

From what I understand the vnode is not on the free list yet so it can't
be reused for something else.

 
 If there is a race in vfs and XLOCK is not used properly, I think that
 should be investigated and fixed instead of patching file systems here
 and there.

I don't know how XLOCK is supposed to be used (and I even less know how
to change things without creating deadlocks). As I see it XLOCK is set
while the vnode is being cleaned, not before or after cleaning it, so
XLOCK is not going to avoid this race anyway.

I asked for help on tech-kern about this but didn't get any reply ...

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote:
 that's not an issue with the reference count. It's an issue with vclean()
 calling VOP_RECLAIM() even if the refcount is greater than 1, and
 vrelel() calling vclean() even if the refcount is greater than 1,
 when the file has been unlink(2)ed. There's a comment about this in
 vrelel(), look for variable recycle.

Ah, ic, probably would've been easier if I read the PR first ;)

So basically someone can vget the vnode (via fhtovp, since it's gone
from the directory namespace) between VOP_INACTIVE and clean.

Your fix doesn't really fix this problem, since depending on timings
the inode might still be recycled after you check it's valid.

Hmm, ok, so things which might happen:

1: VOP_REMOVE, vnode is locked, vrele is called
2: fhtovp before inode is reclaimed, blocks on vn_lock
1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode
2: gets lock, does check that it's still the same inode
1: reclaims vnode
2: boom

Since vget takes the interlock, a dirty  cheap trick might be to check
that the reference count is still one before trying to clean the vnode in
vrelel(), otherwise punting and letting the next release-to-0 reclaim it?

Blah, I didn't even want to think about this migrane-inducer now.
Maybe people who have recently worked on vnode reclaiming could instead
be the ones to comment?


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 10:42:59PM +0300, Antti Kantee wrote:
 On Tue Sep 22 2009 at 21:04:14 +0200, Manuel Bouyer wrote:
  that's not an issue with the reference count. It's an issue with vclean()
  calling VOP_RECLAIM() even if the refcount is greater than 1, and
  vrelel() calling vclean() even if the refcount is greater than 1,
  when the file has been unlink(2)ed. There's a comment about this in
  vrelel(), look for variable recycle.
 
 Ah, ic, probably would've been easier if I read the PR first ;)
 
 So basically someone can vget the vnode (via fhtovp, since it's gone
 from the directory namespace) between VOP_INACTIVE and clean.
 
 Your fix doesn't really fix this problem, since depending on timings
 the inode might still be recycled after you check it's valid.

I don't think so because at this point the vnode is locked. I think the
race window is between vn_lock() releasing the interlock and getting the
vn_lock. After that the reference count is high enouth to prevent
vrelel() trying to clean it (vtryrele() at the start of vrelel will
make it return, and we hold the interlock at this point).


 
 Hmm, ok, so things which might happen:
 
 1: VOP_REMOVE, vnode is locked, vrele is called
 2: fhtovp before inode is reclaimed, blocks on vn_lock

It would fist block on the interlock at this point, I guess.

 1: VOP_INACTIVE releases vnlock, returns signal to recycle vnode
 2: gets lock, does check that it's still the same inode
 1: reclaims vnode
 2: boom
 
 Since vget takes the interlock, a dirty  cheap trick might be to check
 that the reference count is still one before trying to clean the vnode in
 vrelel(), otherwise punting and letting the next release-to-0 reclaim it?

So basically ignoring what VOP_INACTIVE() says. I think this would need
another flag so that new vget() against this vnode can drop it early.
Otherwise we could have a livelock with the NFS server always getting
references to the vnode, preventing it from being recycled and
blocking other threads waiting on the inode to reuse it.

 
 Blah, I didn't even want to think about this migrane-inducer now.
 Maybe people who have recently worked on vnode reclaiming could instead
 be the ones to comment?

that would be nice :)

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Antti Kantee
On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote:
 [explanations]

This is starting to sound sensible now.  You obviously have invested
quite a bit of thought in investigating the problem.

  Blah, I didn't even want to think about this migrane-inducer now.
  Maybe people who have recently worked on vnode reclaiming could instead
  be the ones to comment?
 
 that would be nice :)

You don't like my comments? ;)


Re: CVS commit: src/sys/ufs/ufs

2009-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2009 at 11:26:21PM +0300, Antti Kantee wrote:
 On Tue Sep 22 2009 at 21:58:02 +0200, Manuel Bouyer wrote:
  [explanations]
 
 This is starting to sound sensible now.  You obviously have invested
 quite a bit of thought in investigating the problem.

Yes, it was annoying enough :)

 
   Blah, I didn't even want to think about this migrane-inducer now.
   Maybe people who have recently worked on vnode reclaiming could instead
   be the ones to comment?
  
  that would be nice :)
 
 You don't like my comments? ;)

Ho I do, and you certainly know more than I do in this area ... :)

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/ufs/ufs

2009-09-20 Thread Antti Kantee
On Sun Sep 20 2009 at 14:00:24 +, Manuel Bouyer wrote:
 Module Name:  src
 Committed By: bouyer
 Date: Sun Sep 20 14:00:24 UTC 2009
 
 Modified Files:
   src/sys/ufs/ufs: ufs_ihash.c
 
 Log Message:
 PR kern/41147: race between nfsd and local rm
 Note that the race also exists between 2 nfs client, one of them doing the rm.
 In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
 vget() can sleep. After vget returns, check that vp is still connected with
 ip, and that ip still points to the inode we want. This fix the NULL
 pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.

Um, hold the phone.  The whole point of vget() is to provide race-free
access to the weak vnode reference held by the file system.  Are you
saying this does not hold anymore?