Re: PUFFS and existing file that get ENOENT

2012-01-17 Thread Emmanuel Dreyfus
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:

 it should retry from puffs_cookie2pnode in that case.

Here is a patch that works around the problem (I initially had printf to
check it did go through the ENOENT case and it does). I am about to
commit that and pullup to netbsd-5, except if there are comments.

Index: sys/fs/puffs/puffs_node.c
===
RCS file: /cvsroot/src/sys/fs/puffs/puffs_node.c,v
retrieving revision 1.13.10.3
diff -U 4 -r1.13.10.3 puffs_node.c
--- sys/fs/puffs/puffs_node.c   2 Nov 2011 20:11:12 -
1.13.10.3
+++ sys/fs/puffs/puffs_node.c   18 Jan 2012 03:03:25 -
@@ -320,10 +320,18 @@
vp = pmp-pmp_root;
if (vp) {
mutex_enter(vp-v_interlock);
mutex_exit(pmp-pmp_lock);
-   if (vget(vp, LK_INTERLOCK) == 0)
+   switch (vget(vp, LK_INTERLOCK)) {
+   case ENOENT:
+   goto retry;
+   break;
+   case 0:
return 0;
+   break;
+   default:
+   break;
+   }
} else
mutex_exit(pmp-pmp_lock);
 
/*
@@ -386,8 +394,9 @@
*vpp = pmp-pmp_root;
return 0;
}
 
+retry_vget:
mutex_enter(pmp-pmp_lock);
pnode = puffs_cookie2pnode(pmp, ck);
if (pnode == NULL) {
if (willcreate) {
@@ -405,10 +414,18 @@
 
vgetflags = LK_INTERLOCK;
if (lock)
vgetflags |= LK_EXCLUSIVE | LK_RETRY;
-   if ((rv = vget(vp, vgetflags)))
+   switch (rv = vget(vp, vgetflags)) {
+   case ENOENT:
+   goto retry_vget;
+   break;
+   case 0:
+   break;
+   default:
return rv;
+   break;
+   }
 
*vpp = vp;
return 0;
 }

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: PUFFS and existing file that get ENOENT

2012-01-17 Thread YAMAMOTO Takashi
hi,

 YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:
 
 it should retry from puffs_cookie2pnode in that case.
 
 Here is a patch that works around the problem (I initially had printf to
 check it did go through the ENOENT case and it does). I am about to
 commit that and pullup to netbsd-5, except if there are comments.

 +   case ENOENT:
 +   goto retry;
 +   break;
 +   case 0:
 return 0;
 +   break;

 +   case ENOENT:
 +   goto retry_vget;
 +   break;

please clean up these unreachable statements.
otherwise looks ok to me.  thanks for fixing.

YAMAMOTO Takashi


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Emmanuel Dreyfus
On Mon, Jan 16, 2012 at 06:25:39AM +, YAMAMOTO Takashi wrote:
 it should retry from puffs_cookie2pnode in that case.

I also need to build a test case that reliabiliy reproduce the bug. 
For now I run our build.sh -Uo release and come back the next day, 
this is not very convenient.

As I understand, I need to lookup a node I arealdy node but is beeing
recycled. When does the kernel decide to recycle a vnode?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Emmanuel Dreyfus
On Mon, Jan 16, 2012 at 02:02:41PM +0100, Adam Hamsik wrote:
 Just try to lower that number to some smaller one ?

sysctl(7) says:
 kern.maxvnodes (KERN_MAXVNODES)
 The maximum number of vnodes available on the system.  This can
 only be raised.

But it seems I can lower it from 26214 to 200 without a hitch. I have
no idea how mch room it has, however. We cannot get the number of used
vnode from userland, can we?


-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Rhialto
On Mon 16 Jan 2012 at 13:17:17 +, Emmanuel Dreyfus wrote:
 But it seems I can lower it from 26214 to 200 without a hitch. I have
 no idea how mch room it has, however. We cannot get the number of used
 vnode from userland, can we?

pstat -v gives the number of active vnodes; that may be useful.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- There's no point being grown-up if you 
\X/ rhialto/at/xs4all.nl-- can't be childish sometimes. -The 4th Doctor


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Brian Buhrow
hello.  pstat -v should give you what you want to know.

-thanks
-Brian

On Jan 16,  1:17pm, Emmanuel Dreyfus wrote:
} Subject: Re: PUFFS and existing file that get ENOENT
} On Mon, Jan 16, 2012 at 02:02:41PM +0100, Adam Hamsik wrote:
}  Just try to lower that number to some smaller one ?
} 
} sysctl(7) says:
}  kern.maxvnodes (KERN_MAXVNODES)
}  The maximum number of vnodes available on the system.  This can
}  only be raised.
} 
} But it seems I can lower it from 26214 to 200 without a hitch. I have
} no idea how mch room it has, however. We cannot get the number of used
} vnode from userland, can we?
} 
} 
} -- 
} Emmanuel Dreyfus
} m...@netbsd.org
-- End of excerpt from Emmanuel Dreyfus




Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread YAMAMOTO Takashi
hi,

 On Mon, Jan 16, 2012 at 10:56:33AM +, YAMAMOTO Takashi wrote:
 you can increase the chance by running
  while :;do sysctl -w kern.maxvnodes=0; done
 
 It will always fail:
 bacasable#  sysctl -w kern.maxvnodes=0 
 sysctl: kern.maxvnodes: sysctl() failed with Device busy

it tries to reclaim vnodes before failing.

YAMAMOTO Takashi

 
 -- 
 Emmanuel Dreyfus
 m...@netbsd.org


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread Matthew Mondor
On Mon, 16 Jan 2012 10:56:33 + (UTC)
y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:

 when the kernel wants to cache other files.
 ie. whenever the kernel decides to reclaim it. :-)
 you can increase the chance by running
   while :;do sysctl -w kern.maxvnodes=0; done
 or something like that.

Wouldn't the performance also drop significantly with a permanently low
maxvnodes, though?

Thanks,
-- 
Matt


Re: PUFFS and existing file that get ENOENT

2012-01-16 Thread YAMAMOTO Takashi
hi,

 On Mon, 16 Jan 2012 10:56:33 + (UTC)
 y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 
 when the kernel wants to cache other files.
 ie. whenever the kernel decides to reclaim it. :-)
 you can increase the chance by running
  while :;do sysctl -w kern.maxvnodes=0; done
 or something like that.
 
 Wouldn't the performance also drop significantly with a permanently low
 maxvnodes, though?

it does never succeed.
anyway the performance is not a priority when trying to reproduce a bug.

YAMAMOTO Takashi

 
 Thanks,
 -- 
 Matt


Re: PUFFS and existing file that get ENOENT

2012-01-15 Thread YAMAMOTO Takashi
hi,

 Emmanuel Dreyfus m...@netbsd.org wrote:
 
 Hence I come to the conclusion that it may come from
 sys/kern/vfs_lookup.c, but it is very unlikely that there is a bug there
 that went unnoticed for other filesystems.
 
 Further investigation shows that this ENOENT is returned by vget() call
 in puffs_cookie2vnode(). That suggests some kind of race condition, but
 that is not obvious. It means a vnode has been created on a lookup, then
 it gets recycled while looking up one of its child.

it should retry from puffs_cookie2pnode in that case.

YAMAMOTO Takashi

 
 -- 
 Emmanuel Dreyfus
 http://hcpnet.free.fr/pubz
 m...@netbsd.org


PUFFS and existing file that get ENOENT

2012-01-12 Thread Emmanuel Dreyfus
Hello

I am tracking a rare bug with perfused, where an existing file is
reported as unexistent. It happens after a few hours of usage, and it
happens only once: if I retry accessing the file, I am successful.

Adding traces in perfused while performing ktrace on the calling
process, I am now confident that perfused is not the component that
raises the ENOENT. Here is what I get in the ktrace:

 14085  1 cc1  1326421055.20998 CALL  close(4)
 14085  1 cc1  1326421055.214918302 RET   close 0
 14085  1 cc1  1326421055.215025195 CALL
open(0xbb960a90,4,0x1b6)
 14085  1 cc1  1326421055.215031533 NAMI  
/gfs/manu/netbsd/usr/src/sys/sys/featuretest.h
 14085  1 cc1  1326421055.216282844 RET 
  open -1 errno 2 No such file or directory


In the PUFFS trace I collect in perfused, this open() only causes
LOOKUPs up to /manu/netbsd/usr, all successful, then nothing.

1326421055.215229913 LOOKUP / cn = manu  error = 0
1326421055.215396312 LOOKUP /manu cn = netbsd error = 0
1326421055.215749931 LOOKUP /manu/netbsd  cn = usr error = 0

That means the ENOENT is decided by the kernel on its own, perfused does
not produce it. The question is where can this come from? There are two
ENOENT occurences in sys/fs/puffs. One can happen at mount time, and I
ruled out the other in puffsop_flush() by adding a printf() that never
show up when the bug strikes.

There are also ENOENT in sys/dev/putter/putter.c, but they all have a
printf() that I would have seen, therefore it cannot come from there.

Hence I come to the conclusion that it may come from
sys/kern/vfs_lookup.c, but it is very unlikely that there is a bug there
that went unnoticed for other filesystems.

Anyone would have an idea of what can possibly be going on?


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org