[nfs-discuss] Lots of .nfs files being left around

2008-12-03 Thread Frank Batschulat (Home)
On Wed, 03 Dec 2008 12:53:34 +0100, Mike Gerdts  wrote:

> On Wed, Dec 3, 2008 at 4:57 AM, Frank Batschulat (Home)
>  wrote:
>> On Tue, 02 Dec 2008 23:04:39 +0100, Mike Gerdts  wrote:
>>
>>> Unless there is some long-latent bug in CVS, this looks to be a
>>> regression in the NFSv3 client.
>>
>> quite possibly:
>>
>> nfs3_inactive can leave .nfsXXX files behind
>> http://bugs.opensolaris.org/view_bug.do?bug_id=5029852
>
> Ahh... search term should have been .nfsXXX rather than .nfs.  :)
>
> The bug says that the problem exists in 5.10 as well but I have been
> unable to reproduce on S10u4 (different network topology) or S10u6
> (same network topology) against the same NFS file system.  Is there

interesting indeed.

> something that changed in Nevada that would cause this condition to be
> triggered more frequently?

nothing I'm aware of yet as far as V3 is concerned, though that does not
mean there's nothing new in a different path ;-)

what we do have already, but for V4 is:

NFSv4 clients leave too many .nfsXXX files around
http://bugs.opensolaris.org/view_bug.do?bug_id=6636160

though it specifically claims using V3 cures the problem...

so this must be something rather new or yet unknown, would it
be possible to reproduce this somehow ?

---
frankB



[nfs-discuss] Lots of .nfs files being left around

2008-12-03 Thread Frank Batschulat (Home)
On Tue, 02 Dec 2008 23:04:39 +0100, Mike Gerdts  wrote:

> Unless there is some long-latent bug in CVS, this looks to be a
> regression in the NFSv3 client.

quite possibly:

nfs3_inactive can leave .nfsXXX files behind
http://bugs.opensolaris.org/view_bug.do?bug_id=5029852

I've moved the comments to the description, but they are not yet visible,
so here they are for the time being:


While testing fix for 4903465, I found an error path
which could leave behind .nfsXXX files on the server.

If a open file has been renamed or unlinkied then r_unldvp will
be set.  nfs3_inactive() checks if r_unldvp is present and if
so does the remove operation for the .nfs file on the server.
Before we do the remove we unset r_unldvp.

If the thread which is doing the remove gets signalled
before entering rfscall(), then rfscall() will fail returning 
RPC_INTR as the status. Back in nfs3_inactive() we do not do 
anything with unldvp if the call has failed. And we add the 
rnode to the free list. So the .nfsXXX file gets left behind. 

The number of files left behind can be huge if we are entering 
nfs3_inactive() through nfs_purge_caches() of a directory and 
the thread gets signalled in between. Since the only place we 
check for the signal is rfscall() in the code  path of 

nfs_purge_caches()
  -> dnlc_purge_vp()
 -> nfs3_inactive()
 -> rfs3call()
-> rfscall()

we will end up with lot of .nfsXXX files on the server.




-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.



[nfs-discuss] Lots of .nfs files being left around

2008-12-03 Thread Mike Gerdts
On Wed, Dec 3, 2008 at 6:59 AM, Frank Batschulat (Home)
 wrote:
> On Wed, 03 Dec 2008 12:53:34 +0100, Mike Gerdts  wrote:
>
>> On Wed, Dec 3, 2008 at 4:57 AM, Frank Batschulat (Home)
>>  wrote:
>>> On Tue, 02 Dec 2008 23:04:39 +0100, Mike Gerdts  
>>> wrote:
>>>
 Unless there is some long-latent bug in CVS, this looks to be a
 regression in the NFSv3 client.
>>>
>>> quite possibly:
>>>
>>> nfs3_inactive can leave .nfsXXX files behind
>>> http://bugs.opensolaris.org/view_bug.do?bug_id=5029852
>>
>> Ahh... search term should have been .nfsXXX rather than .nfs.  :)
>>
>> The bug says that the problem exists in 5.10 as well but I have been
>> unable to reproduce on S10u4 (different network topology) or S10u6
>> (same network topology) against the same NFS file system.  Is there
>
> interesting indeed.
>
>> something that changed in Nevada that would cause this condition to be
>> triggered more frequently?
>
> nothing I'm aware of yet as far as V3 is concerned, though that does not
> mean there's nothing new in a different path ;-)
>
> what we do have already, but for V4 is:

I was almost sure you were going to say "just use NFSv4".  :)

> NFSv4 clients leave too many .nfsXXX files around
> http://bugs.opensolaris.org/view_bug.do?bug_id=6636160
>
> though it specifically claims using V3 cures the problem...
>
> so this must be something rather new or yet unknown, would it
> be possible to reproduce this somehow ?

I have been able to reproduce it with the following (bash syntax).
NFS client is SXCE snv_99.

export CVSROOT=/tmp/repo
mkdir $CVSROOT
cvs init
cd $nfsdir
mkdir foo
cd foo
touch {a..z}   # creates 26 files
cvs import foo bar baz
cd ..
rm -rf foo
cvs co foo
find foo -name .nfs\*

Typically there are a few .nfs files left in foo/CVS.  When I tried it
against a NetApp, I typically got about 5 - 10 .nfs files.  I just
reproduced against a S10u4 + 127111-09 server and got one .nfs file.
In each case, there was a high-speed (gigabit+, ~1.6 ms latency) MAN
(metropolitan area network) between the client and server.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/



[nfs-discuss] Lots of .nfs files being left around

2008-12-03 Thread Mike Gerdts
On Wed, Dec 3, 2008 at 4:57 AM, Frank Batschulat (Home)
 wrote:
> On Tue, 02 Dec 2008 23:04:39 +0100, Mike Gerdts  wrote:
>
>> Unless there is some long-latent bug in CVS, this looks to be a
>> regression in the NFSv3 client.
>
> quite possibly:
>
> nfs3_inactive can leave .nfsXXX files behind
> http://bugs.opensolaris.org/view_bug.do?bug_id=5029852

Ahh... search term should have been .nfsXXX rather than .nfs.  :)

The bug says that the problem exists in 5.10 as well but I have been
unable to reproduce on S10u4 (different network topology) or S10u6
(same network topology) against the same NFS file system.  Is there
something that changed in Nevada that would cause this condition to be
triggered more frequently?

Thank you for tracking this down for me.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/



[nfs-discuss] Lots of .nfs files being left around

2008-12-02 Thread Mike Gerdts
On Tue, Dec 2, 2008 at 12:36 PM, Ben Rockwood  wrote:
> Mike Gerdts wrote:
>> Over the last couple of months I have noticed lots of .nfs files being
>> left around while using cvs.
>>
>> A typical command:
>>
>> cvs -d $cvsroot co -d dotnfs-`uname -r` -r $release jass
>>
>> On snv_99:
>>
>> $ find dotnfs-5.11 -name .nfs\* | wc -l
>>  822
>>
>> That number does not diminish over time, unless I use rm to get rid of
>> the .nfs files.
>>
>> However, Solaris 10 looks lots better:
>>
>> $ find dotnfs-5.10 -name .nfs\* | wc -l
>>0
>>
>> I saw similar things on snv_93.  I have confirmed that only NFSv3 is
>> in use on each system using "nfsstat -c".
>>
>> Any clues?
>>
>
> .nfs files are created when a file is deleted that is still open.  You
> should see a cronjob in the root crontab:
>
> /usr/lib/fs/nfs/nfsfind

The NFS server is a NetApp, so there is likely a different solution
needed.  It does seem as though it automatically cleans them up on the
first read.  Unfortunately, if that first read is part of a software
release process, the release code may have .nfs files in it.

> That cleans up the .nfs* files each week.
>
> So far the party line has been "fix your app".  I've suffered a lot of
> .nfs* pain, but I'm not aware of any related bugs atm.

$ type cvs
cvs is hashed (/usr/bin/cvs)

$ cvs --version

Concurrent Versions System (CVS) 1.12.13 (client/server)
...

Based upon the pstamp for SUNWcvs my guess is that the "go fix your
app" is directed at the SFW consolidation.

$ pkgparam SUNWcvs VERSION PSTAMP
11.11.0,REV=2008.09.17.14.32
sfwnv20080917143303

FWIW the problem exists with CSWcvs  1.11.22,REV=2006.12.11 as well.
Unless there is some long-latent bug in CVS, this looks to be a
regression in the NFSv3 client.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/



[nfs-discuss] Lots of .nfs files being left around

2008-12-02 Thread Ben Rockwood
Mike Gerdts wrote:
> Over the last couple of months I have noticed lots of .nfs files being
> left around while using cvs.
>
> A typical command:
>
> cvs -d $cvsroot co -d dotnfs-`uname -r` -r $release jass
>
> On snv_99:
>
> $ find dotnfs-5.11 -name .nfs\* | wc -l
>  822
>
> That number does not diminish over time, unless I use rm to get rid of
> the .nfs files.
>
> However, Solaris 10 looks lots better:
>
> $ find dotnfs-5.10 -name .nfs\* | wc -l
>0
>
> I saw similar things on snv_93.  I have confirmed that only NFSv3 is
> in use on each system using "nfsstat -c".
>
> Any clues?
>   

.nfs files are created when a file is deleted that is still open.  You
should see a cronjob in the root crontab:

/usr/lib/fs/nfs/nfsfind

That cleans up the .nfs* files each week.

So far the party line has been "fix your app".  I've suffered a lot of
.nfs* pain, but I'm not aware of any related bugs atm.

benr.






[nfs-discuss] Lots of .nfs files being left around

2008-12-02 Thread Mike Gerdts
Over the last couple of months I have noticed lots of .nfs files being
left around while using cvs.

A typical command:

cvs -d $cvsroot co -d dotnfs-`uname -r` -r $release jass

On snv_99:

$ find dotnfs-5.11 -name .nfs\* | wc -l
 822

That number does not diminish over time, unless I use rm to get rid of
the .nfs files.

However, Solaris 10 looks lots better:

$ find dotnfs-5.10 -name .nfs\* | wc -l
   0

I saw similar things on snv_93.  I have confirmed that only NFSv3 is
in use on each system using "nfsstat -c".

Any clues?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/