Bug#1017720: nfs-common: No such file or directory

2024-04-09 Thread Vincent Lefevre
On 2024-04-09 14:09:43 +0200, Vincent Lefevre wrote:
> Some additional information: created only once, but data may be
> appended (on the creator's side, the file is created for writing,
> and data are written occasionally, and at some point, the file is
> closed). The error with "open" may occur even several hours after
> the last time data were written to the file.

This is actually reproducible with a read-only directory.
I've attached a Perl script to reproduce the issue, just
based on "stat".

The conditions seem to be:
  * The directory and the files need to be recent enough: I can't
reproduce the issue with an old directory, even if I add many
new files into it.
  * Concurrent "stat": with the attached script, the issue is
reproducible with 2 threads or more, but not with a single
thread.

Example of errors:

./dir-stat: can't stat . (x 2)
./dir-stat: can't stat 775 (x 148)
./dir-stat: can't stat 772 (x 1)
./dir-stat: can't stat 415 (x 1)
./dir-stat: can't stat 716 (x 1)
./dir-stat: can't stat 453 (x 1)
./dir-stat: can't stat 9 (x 1)
./dir-stat: can't stat 201 (x 1)
./dir-stat: can't stat 981 (x 1)
./dir-stat: can't stat 660 (x 1)
./dir-stat: can't stat 120 (x 1)
./dir-stat: can't stat 127 (x 1)
./dir-stat: can't stat 663 (x 1)

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
#!/usr/bin/env perl

# Create a directory with several hundreds of files, for instance with
#   mkdir test && cd test && touch `seq 999`
# then run this Perl script with the directory name in argument.

use strict;
use threads;

my $maxthreads = 2;
my $nthreads = 0;

@ARGV == 1 or die "Usage: $0 \n";
my $dir = $ARGV[0];
-d $dir or die "$0: $dir is not a directory\n";

sub thr ($) {
  my $file = $_[0];
  my $err = 0;
  until (stat "$dir/$file")
{
  $err++;
  sleep 0.25;
}
  warn "$0: can't stat $file (x $err)\n" if $err;
}

sub join_threads () {
  my @thr;
  sleep 0.25 until @thr = threads->list(threads::joinable);
  foreach my $thr (@thr)
{ $thr->join(); }
  $nthreads -= @thr;
}

opendir DIR, $dir or die "$0: opendir failed ($!)\n";
while (my $file = readdir DIR)
  {
$nthreads < $maxthreads or join_threads;
$nthreads++ < $maxthreads or die "$0: internal error\n";
threads->create(\, $file);
  }
closedir DIR or die "$0: closedir failed ($!)\n";
join_threads while $nthreads;


Bug#1017720: nfs-common: No such file or directory

2024-04-09 Thread Vincent Lefevre
On 2024-04-04 14:56:47 +0200, Vincent Lefevre wrote:
> On 2023-11-29 16:19:02 +0100, Vincent Lefevre wrote:
> > I have the same kind of issue at my lab with one of my programs:
> > a readdir lists the file, but then a stat sometimes gives a
> > "No such file or directory" error. Some clients are more affected
> > that others.
> 
> And sometimes, the "stat" succeeds as expected, but the "open" that
> follows gives a "No such file or directory" error.
> 
> Also note that in my case, the file under this filename is unique:
> it is created only once (never deleted then recreated).

Some additional information: created only once, but data may be
appended (on the creator's side, the file is created for writing,
and data are written occasionally, and at some point, the file is
closed). The error with "open" may occur even several hours after
the last time data were written to the file.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#1017720: nfs-common: No such file or directory

2024-04-04 Thread Vincent Lefevre
On 2023-11-29 16:19:02 +0100, Vincent Lefevre wrote:
> I have the same kind of issue at my lab with one of my programs:
> a readdir lists the file, but then a stat sometimes gives a
> "No such file or directory" error. Some clients are more affected
> that others.

And sometimes, the "stat" succeeds as expected, but the "open" that
follows gives a "No such file or directory" error.

Also note that in my case, the file under this filename is unique:
it is created only once (never deleted then recreated).

> The clients are Debian 11.8 machines (also nfs-common 1:1.3.4-6;
> 5.10.0-26-amd64 kernel).

Still the same problem with a Debian 12.5 machine (nfs-common 1:2.6.2-4;
6.1.0-18-amd64 (6.1.76-1) kernel) on the client side.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#1017720: nfs-common: No such file or directory

2024-03-15 Thread Kenneth C. Schalk

I've been looking for an explanation for a similar kind of failure.  I
wonder if the core problem is the same as reported here:

https://bugzilla.opensuse.org/show_bug.cgi?id=1209457

Supposedly fixed by the patch posted to linux-nfs here (also attached
after extraction from the SuSE kernel build source RPM):

https://www.spinics.net/lists/linux-nfs/msg86343.html

And merged into the main-line kernel in 5.15.3, see the Changelog here
(specifically under commit "69e0be0efe53fb012f5db32bc328590745cf8f71"):

https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15.3

I would be interested to hear whether it makes any difference for the
issue reported in this Debian bug.

--KenFrom 255fc6efacf25d954a986ff058fd9899f322e7d1 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Tue, 28 Sep 2021 11:15:53 -0400
Subject: [PATCH] NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA
Git-commit: 488796ec1e39fb9194cc8175f770823d40fbf0ed
Patch-mainline: v5.16-rc1
References: stable-5.14.19

[ Upstream commit 488796ec1e39fb9194cc8175f770823d40fbf0ed ]

NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered
mutually exclusive.

Fixes: 1c341b777501 ("NFS: Add deferred cache invalidation for close-to-open consistency violations")
Signed-off-by: Trond Myklebust 
Tested-by: Benjamin Coddington 
Reviewed-by: Benjamin Coddington 
Signed-off-by: Sasha Levin 
Acked-by: Takashi Iwai 

---
 fs/nfs/inode.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6ea1bde33cb6..f9d3ad3acf11 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -210,10 +210,15 @@ void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
 		flags &= ~NFS_INO_INVALID_XATTR;
 	if (flags & NFS_INO_INVALID_DATA)
 		nfs_fscache_invalidate(inode);
-	if (inode->i_mapping->nrpages == 0)
-		flags &= ~(NFS_INO_INVALID_DATA|NFS_INO_DATA_INVAL_DEFER);
 	flags &= ~(NFS_INO_REVAL_PAGECACHE | NFS_INO_REVAL_FORCED);
+
 	nfsi->cache_validity |= flags;
+
+	if (inode->i_mapping->nrpages == 0)
+		nfsi->cache_validity &= ~(NFS_INO_INVALID_DATA |
+	  NFS_INO_DATA_INVAL_DEFER);
+	else if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+		nfsi->cache_validity &= ~NFS_INO_DATA_INVAL_DEFER;
 }
 EXPORT_SYMBOL_GPL(nfs_set_cache_invalid);
 
-- 
2.26.2



Bug#1017720: nfs-common: No such file or directory

2023-11-29 Thread Vincent Lefevre
On 2022-08-19 13:16:47 +, Jason Breitman wrote:
> Package: nfs-common
> Version: 1:1.3.4-6
> Severity: important
> 
> Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux
> 
> -- Description
> After updating and or creating new files on our file server via rsync, we 
> see many files report the error message below from NFSv4 clients since 
> upgrading from Debian 10.8 to Debian 11.4.
> Clearing the dentry cache resolves the issue right away.
> I am not sure that nfs-common is the package to blame, but listed it 
> based on the bug submission recommendations. 
> 
> -- Test
> ls -l /mnt/dir/someOtherDir/* | grep '?'
> 
> -- Error message
> ls: cannot access 'filename': No such file or directory
> -? ? ???? filename

I have the same kind of issue at my lab with one of my programs:
a readdir lists the file, but then a stat sometimes gives a
"No such file or directory" error. Some clients are more affected
that others.

The clients are Debian 11.8 machines (also nfs-common 1:1.3.4-6;
5.10.0-26-amd64 kernel).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#1017720: nfs-common: No such file or directory

2022-09-22 Thread Jason Breitman
The issue also occurs when using the lookupcache=none option along with the 
5.10.X kernel.
I was hoping for this option to succeed and to investigate the performance 
impact, but it is no longer viable.
I believe that I am out of options to try with the 5.10.X kernel.
Please let me know where we stand.

> -Original Message-
> From: Jason Breitman
> Sent: Wednesday, September 21, 2022 1:01 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I now know that this behavior does exist in Debian Buster 10.8 and more
> specifically in the 4.19.X kernel after running stricter testing on more 
> servers.
> The 4.19.X kernel resolves itself immediately following the No such file or
> directory error which is different than the 5.X kernel requiring me to clear 
> the
> inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches.
> What further information is required to resolve this issue?
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Tuesday, September 13, 2022 4:41 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I downgraded the nfs-common package which required the downgrade of
> > the libevent packages and am using the 4.19.X kernel.
> > I see the issue running the initial test, but then the issue is gone when
> > running the test a subsequent time.
> >
> > libevent-2.1-6:amd64  2.1.8-stable-4
> > amd64
> > Asynchronous event notification library
> > libevent-core-2.1-6:amd64 2.1.8-stable-4
> > amd64
> > Asynchronous event notification library (core)
> > libevent-pthreads-2.1-6:amd64 2.1.8-stable-4
> > amd64
> > Asynchronous event notification library (pthreads)
> > linux-image-4.19.0-21-amd644.19.249-2  
> > amd64Linux
> > 4.19 for 64-bit PCs (signed)
> > nfs-common  1:1.3.4-2.5+deb10u1 
> >amd64NFS
> > support files common to client and server
> >
> > What other packages do I need to downgrade in order to get Debian 11.4 to
> > behave like Debian 10.8?
> > What additional questions can I answer so that we can move forward?
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Tuesday, September 6, 2022 5:18 PM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I also see the failure with the kernels below, but the 4.19.X kernel
> resolves
> > > the issue without dropping caches.
> > > linux-image-4.19.0-14-amd64   4.19.171-2 amd64
> > > Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > > linux-image-4.19.0-21-amd64   4.19.249-2 amd64
> > > Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > >
> > > I see the issue running the initial test, but then the issue is gone when
> > > running the test a subsequent time.
> > > I ran several tests to verify the behavior differences between the 4.19.X
> > and
> > > 5.X kernels.
> > >
> > > -- Test
> > > ls -l /mnt/dir/someOtherDir/* | grep '?'
> > >
> > > -- Error message - the error message is showing files that have been
> erased
> > > via rsync --delete
> > > ls: cannot access 'filename': No such file or directory
> > > -? ? ???? filename
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Friday, September 2, 2022 5:17 PM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I have tested with the following kernels and see this issue in each 
> > > > case.
> > > >
> > > > linux-image-5.10.0-16-amd64  5.10.127-1 
> > > >  amd64
> > > Linux
> > > > 5.10 for 64-bit PCs (signed)
> > > > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  
> > > > amd64
> > > > Linux 5.15 for 64-bit PCs (signed)
> > > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> > > &g

Bug#1017720: nfs-common: No such file or directory

2022-09-21 Thread Jason Breitman
I now know that this behavior does exist in Debian Buster 10.8 and more 
specifically in the 4.19.X kernel after running stricter testing on more 
servers.
The 4.19.X kernel resolves itself immediately following the No such file or 
directory error which is different than the 5.X kernel requiring me to clear 
the inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches.
What further information is required to resolve this issue?

> -Original Message-
> From: Jason Breitman
> Sent: Tuesday, September 13, 2022 4:41 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I downgraded the nfs-common package which required the downgrade of
> the libevent packages and am using the 4.19.X kernel.
> I see the issue running the initial test, but then the issue is gone when
> running the test a subsequent time.
> 
> libevent-2.1-6:amd64  2.1.8-stable-4  
>   amd64
> Asynchronous event notification library
> libevent-core-2.1-6:amd64 2.1.8-stable-4
> amd64
> Asynchronous event notification library (core)
> libevent-pthreads-2.1-6:amd64 2.1.8-stable-4amd64
> Asynchronous event notification library (pthreads)
> linux-image-4.19.0-21-amd644.19.249-2  
> amd64Linux
> 4.19 for 64-bit PCs (signed)
> nfs-common  1:1.3.4-2.5+deb10u1   
>  amd64NFS
> support files common to client and server
> 
> What other packages do I need to downgrade in order to get Debian 11.4 to
> behave like Debian 10.8?
> What additional questions can I answer so that we can move forward?
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Tuesday, September 6, 2022 5:18 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I also see the failure with the kernels below, but the 4.19.X kernel 
> > resolves
> > the issue without dropping caches.
> > linux-image-4.19.0-14-amd64   4.19.171-2 amd64  
> >   Linux 4.19
> for
> > 64-bit PCs (signed)
> > linux-image-4.19.0-21-amd64   4.19.249-2 amd64  
> >   Linux 4.19
> for
> > 64-bit PCs (signed)
> >
> > I see the issue running the initial test, but then the issue is gone when
> > running the test a subsequent time.
> > I ran several tests to verify the behavior differences between the 4.19.X
> and
> > 5.X kernels.
> >
> > -- Test
> > ls -l /mnt/dir/someOtherDir/* | grep '?'
> >
> > -- Error message - the error message is showing files that have been erased
> > via rsync --delete
> > ls: cannot access 'filename': No such file or directory
> >     -? ? ???? filename
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Friday, September 2, 2022 5:17 PM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I have tested with the following kernels and see this issue in each case.
> > >
> > > linux-image-5.10.0-16-amd64  5.10.127-1   
> > >amd64
> > Linux
> > > 5.10 for 64-bit PCs (signed)
> > > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  amd64
> > > Linux 5.15 for 64-bit PCs (signed)
> > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> > > Linux 5.18 for 64-bit PCs (signed)
> > >
> > > An interesting note is that when using the 5.18 kernel, I had to run echo 
> > > 3
> >
> > > /proc/sys/vm/drop_caches to resolve the issue.
> > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and
> > > 5.15 kernels.
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Friday, August 26, 2022 3:36 PM
> > > > To: 'Ben Hutchings' ;
> '1017...@bugs.debian.org'
> > > > <1017...@bugs.debian.org>
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I was able to identify another workaround today which may help you to
> > > > identify the issue.
> > > > The workaround is to touch the directory where the troubled files live
> on
> > > the
> > > > file server.
> >

Bug#1017720: nfs-common: No such file or directory

2022-09-13 Thread Jason Breitman
I downgraded the nfs-common package which required the downgrade of the 
libevent packages and am using the 4.19.X kernel.
I see the issue running the initial test, but then the issue is gone when 
running the test a subsequent time.

libevent-2.1-6:amd64  2.1.8-stable-4
amd64Asynchronous event notification library
libevent-core-2.1-6:amd64 2.1.8-stable-4
amd64Asynchronous event notification library (core)
libevent-pthreads-2.1-6:amd64 2.1.8-stable-4amd64   
 Asynchronous event notification library (pthreads)
linux-image-4.19.0-21-amd644.19.249-2  
amd64Linux 4.19 for 64-bit PCs (signed)
nfs-common  1:1.3.4-2.5+deb10u1
amd64NFS support files common to client and server

What other packages do I need to downgrade in order to get Debian 11.4 to 
behave like Debian 10.8?
What additional questions can I answer so that we can move forward?

> -Original Message-
> From: Jason Breitman
> Sent: Tuesday, September 6, 2022 5:18 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I also see the failure with the kernels below, but the 4.19.X kernel resolves
> the issue without dropping caches.
> linux-image-4.19.0-14-amd64   4.19.171-2 amd64
> Linux 4.19 for
> 64-bit PCs (signed)
> linux-image-4.19.0-21-amd64   4.19.249-2 amd64
> Linux 4.19 for
> 64-bit PCs (signed)
> 
> I see the issue running the initial test, but then the issue is gone when
> running the test a subsequent time.
> I ran several tests to verify the behavior differences between the 4.19.X and
> 5.X kernels.
> 
> -- Test
> ls -l /mnt/dir/someOtherDir/* | grep '?'
> 
> -- Error message - the error message is showing files that have been erased
> via rsync --delete
> ls: cannot access 'filename': No such file or directory
> -? ? ???? filename
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Friday, September 2, 2022 5:17 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I have tested with the following kernels and see this issue in each case.
> >
> > linux-image-5.10.0-16-amd64  5.10.127-1 
> >  amd64
> Linux
> > 5.10 for 64-bit PCs (signed)
> > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  amd64
> > Linux 5.15 for 64-bit PCs (signed)
> > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> > Linux 5.18 for 64-bit PCs (signed)
> >
> > An interesting note is that when using the 5.18 kernel, I had to run echo 3 
> > >
> > /proc/sys/vm/drop_caches to resolve the issue.
> > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and
> > 5.15 kernels.
> >
> > > -----Original Message-
> > > From: Jason Breitman
> > > Sent: Friday, August 26, 2022 3:36 PM
> > > To: 'Ben Hutchings' ; '1017...@bugs.debian.org'
> > > <1017...@bugs.debian.org>
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I was able to identify another workaround today which may help you to
> > > identify the issue.
> > > The workaround is to touch the directory where the troubled files live on
> > the
> > > file server.
> > > I believe this tells us that updating the modify time attribute is used by
> the
> > > cache.
> > > It should be noted that access time updates are disabled on the file
> server.
> > >
> > > I also wanted to restate that we use rsync to push out these application
> > > updates and also use rsync to sync data files.
> > > Our rsync options preserve timestamps, so it is possible that the new 
> > > files
> > > have an older timestamp than "now".
> > > It is not the case that the new files have an older timestamp than the
> prior
> > > version that is stuck in the cache.
> > >
> > > The rsync process that I describe has not changed and has been in use for
> > > many years.
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Thursday, August 25, 2022 11:54 AM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such fi

Bug#1017720: nfs-common: No such file or directory

2022-09-06 Thread Jason Breitman
I also see the failure with the kernels below, but the 4.19.X kernel resolves 
the issue without dropping caches.
linux-image-4.19.0-14-amd64   4.19.171-2 amd64
Linux 4.19 for 64-bit PCs (signed)
linux-image-4.19.0-21-amd64   4.19.249-2 amd64
Linux 4.19 for 64-bit PCs (signed)

I see the issue running the initial test, but then the issue is gone when 
running the test a subsequent time.
I ran several tests to verify the behavior differences between the 4.19.X and 
5.X kernels.

-- Test
ls -l /mnt/dir/someOtherDir/* | grep '?'

-- Error message - the error message is showing files that have been erased via 
rsync --delete
ls: cannot access 'filename': No such file or directory
-? ? ???? filename

> -Original Message-
> From: Jason Breitman
> Sent: Friday, September 2, 2022 5:17 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I have tested with the following kernels and see this issue in each case.
> 
> linux-image-5.10.0-16-amd64  5.10.127-1   
>amd64Linux
> 5.10 for 64-bit PCs (signed)
> linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  amd64
> Linux 5.15 for 64-bit PCs (signed)
> linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> Linux 5.18 for 64-bit PCs (signed)
> 
> An interesting note is that when using the 5.18 kernel, I had to run echo 3 >
> /proc/sys/vm/drop_caches to resolve the issue.
> echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and
> 5.15 kernels.
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Friday, August 26, 2022 3:36 PM
> > To: 'Ben Hutchings' ; '1017...@bugs.debian.org'
> > <1017...@bugs.debian.org>
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I was able to identify another workaround today which may help you to
> > identify the issue.
> > The workaround is to touch the directory where the troubled files live on
> the
> > file server.
> > I believe this tells us that updating the modify time attribute is used by 
> > the
> > cache.
> > It should be noted that access time updates are disabled on the file server.
> >
> > I also wanted to restate that we use rsync to push out these application
> > updates and also use rsync to sync data files.
> > Our rsync options preserve timestamps, so it is possible that the new files
> > have an older timestamp than "now".
> > It is not the case that the new files have an older timestamp than the prior
> > version that is stuck in the cache.
> >
> > The rsync process that I describe has not changed and has been in use for
> > many years.
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Thursday, August 25, 2022 11:54 AM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I have the same issue after adding actimeo=30 to /etc/fstab, rebooting
> and
> > > testing.
> > > I also confirmed that those settings applied via /proc/mounts which
> shows
> > > the below snippet for each mountpoint.
> > > nfs4
> > >
> >
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> > >
> >
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> > >
> >
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> > > 0
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Tuesday, August 23, 2022 2:42 PM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > What additional information can I provide for us to move forward with
> > this
> > > > process?
> > > >
> > > > To summarize and include further details, rsync is used to sync
> > applications
> > > to
> > > > a file server which behaves like a repository.
> > > > We do preserve timestamps from the build server and also use --
> delete.
> > > We
> > > > do not run the applications from the file server.  All servers use NTP.
> > > >
> > > > The application has a sub-directory that contain files with version
> > numbers.
> > > > These are librari

Bug#1017720: nfs-common: No such file or directory

2022-09-02 Thread Jason Breitman
I have tested with the following kernels and see this issue in each case.

linux-image-5.10.0-16-amd64  5.10.127-1 
 amd64Linux 5.10 for 64-bit PCs (signed)
linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  amd64   
 Linux 5.15 for 64-bit PCs (signed)
linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
Linux 5.18 for 64-bit PCs (signed)

An interesting note is that when using the 5.18 kernel, I had to run echo 3 > 
/proc/sys/vm/drop_caches to resolve the issue.
echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 and 5.15 
kernels.

> -Original Message-
> From: Jason Breitman
> Sent: Friday, August 26, 2022 3:36 PM
> To: 'Ben Hutchings' ; '1017...@bugs.debian.org'
> <1017...@bugs.debian.org>
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I was able to identify another workaround today which may help you to
> identify the issue.
> The workaround is to touch the directory where the troubled files live on the
> file server.
> I believe this tells us that updating the modify time attribute is used by the
> cache.
> It should be noted that access time updates are disabled on the file server.
> 
> I also wanted to restate that we use rsync to push out these application
> updates and also use rsync to sync data files.
> Our rsync options preserve timestamps, so it is possible that the new files
> have an older timestamp than "now".
> It is not the case that the new files have an older timestamp than the prior
> version that is stuck in the cache.
> 
> The rsync process that I describe has not changed and has been in use for
> many years.
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Thursday, August 25, 2022 11:54 AM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I have the same issue after adding actimeo=30 to /etc/fstab, rebooting and
> > testing.
> > I also confirmed that those settings applied via /proc/mounts which shows
> > the below snippet for each mountpoint.
> > nfs4
> >
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> >
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> >
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> > 0
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Tuesday, August 23, 2022 2:42 PM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > What additional information can I provide for us to move forward with
> this
> > > process?
> > >
> > > To summarize and include further details, rsync is used to sync
> applications
> > to
> > > a file server which behaves like a repository.
> > > We do preserve timestamps from the build server and also use --delete.
> > We
> > > do not run the applications from the file server.  All servers use NTP.
> > >
> > > The application has a sub-directory that contain files with version
> numbers.
> > > These are libraries.
> > > When a new build is complete, a developer pushes their updates via
> rsync
> > to
> > > the file server / repository.
> > >
> > > I believe that the dentry cache thinks the "old" files exist and 
> > > generates a
> > No
> > > such file or directory error showing question marks for that files
> attributes.
> > > Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches
> > resolves
> > > the issue.
> > >
> > > This behavior is not observed in Debian 10.8 with that distributions
> > associated
> > > kernel and packages.
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Friday, August 19, 2022 9:52 PM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > > -Original Message-
> > > > > From: Ben Hutchings 
> > > > > Sent: Friday, August 19, 2022 7:27 PM
> > > > > To: Jason Breitman ;
> > > > > 1017...@bugs.debian.org
> > > > > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> > > > >
> > > > > Control: tag -1 moreinfo
> > > > >
> > > > >

Bug#1017720: nfs-common: No such file or directory

2022-08-26 Thread Jason Breitman
I was able to identify another workaround today which may help you to identify 
the issue.
The workaround is to touch the directory where the troubled files live on the 
file server.
I believe this tells us that updating the modify time attribute is used by the 
cache.
It should be noted that access time updates are disabled on the file server.

I also wanted to restate that we use rsync to push out these application 
updates and also use rsync to sync data files.
Our rsync options preserve timestamps, so it is possible that the new files 
have an older timestamp than "now".
It is not the case that the new files have an older timestamp than the prior 
version that is stuck in the cache.

The rsync process that I describe has not changed and has been in use for many 
years.

> -Original Message-
> From: Jason Breitman
> Sent: Thursday, August 25, 2022 11:54 AM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I have the same issue after adding actimeo=30 to /etc/fstab, rebooting and
> testing.
> I also confirmed that those settings applied via /proc/mounts which shows
> the below snippet for each mountpoint.
> nfs4
> rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a
> cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s
> ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0
> 0
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Tuesday, August 23, 2022 2:42 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > What additional information can I provide for us to move forward with this
> > process?
> >
> > To summarize and include further details, rsync is used to sync applications
> to
> > a file server which behaves like a repository.
> > We do preserve timestamps from the build server and also use --delete.
> We
> > do not run the applications from the file server.  All servers use NTP.
> >
> > The application has a sub-directory that contain files with version numbers.
> > These are libraries.
> > When a new build is complete, a developer pushes their updates via rsync
> to
> > the file server / repository.
> >
> > I believe that the dentry cache thinks the "old" files exist and generates a
> No
> > such file or directory error showing question marks for that files 
> > attributes.
> > Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches
> resolves
> > the issue.
> >
> > This behavior is not observed in Debian 10.8 with that distributions
> associated
> > kernel and packages.
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Friday, August 19, 2022 9:52 PM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > > -Original Message-
> > > > From: Ben Hutchings 
> > > > Sent: Friday, August 19, 2022 7:27 PM
> > > > To: Jason Breitman ;
> > > > 1017...@bugs.debian.org
> > > > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > Control: tag -1 moreinfo
> > > >
> > > > On Fri, 2022-08-19 at 13:16 +, Jason Breitman wrote:
> > > > > Package: nfs-common
> > > > > Version: 1:1.3.4-6
> > > > > Severity: important
> > > > >
> > > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30)
> x86_64
> > > > > GNU/Linux
> > > > >
> > > > > -- Description
> > > > > After updating and or creating new files on our file server via
> > > > > rsync, we see many files report the error message below from NFSv4
> > > > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > > > Clearing the dentry cache resolves the issue right away.
> > > > > I am not sure that nfs-common is the package to blame, but listed
> > > > > it based on the bug submission recommendations.
> > > >
> > > > The NFS implementation is mostly in the kernel, so probably this issue
> > > > belongs there.  But the kernel team is responsible for both packages.
> > > >
> > > > [...]
> > > > > -- Error message
> > > > > ls: cannot access 'filename': No such file or directory
> > > > > -? ? ??  

Bug#1017720: nfs-common: No such file or directory

2022-08-25 Thread Jason Breitman
I have the same issue after adding actimeo=30 to /etc/fstab, rebooting and 
testing.
I also confirmed that those settings applied via /proc/mounts which shows the 
below snippet for each mountpoint.
nfs4 
rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,acregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y
 0 0

> -Original Message-
> From: Jason Breitman
> Sent: Tuesday, August 23, 2022 2:42 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> What additional information can I provide for us to move forward with this
> process?
> 
> To summarize and include further details, rsync is used to sync applications 
> to
> a file server which behaves like a repository.
> We do preserve timestamps from the build server and also use --delete.  We
> do not run the applications from the file server.  All servers use NTP.
> 
> The application has a sub-directory that contain files with version numbers.
> These are libraries.
> When a new build is complete, a developer pushes their updates via rsync to
> the file server / repository.
> 
> I believe that the dentry cache thinks the "old" files exist and generates a 
> No
> such file or directory error showing question marks for that files attributes.
> Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches resolves
> the issue.
> 
> This behavior is not observed in Debian 10.8 with that distributions 
> associated
> kernel and packages.
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Friday, August 19, 2022 9:52 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > > -Original Message-
> > > From: Ben Hutchings 
> > > Sent: Friday, August 19, 2022 7:27 PM
> > > To: Jason Breitman ;
> > > 1017...@bugs.debian.org
> > > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> > >
> > > Control: tag -1 moreinfo
> > >
> > > On Fri, 2022-08-19 at 13:16 +, Jason Breitman wrote:
> > > > Package: nfs-common
> > > > Version: 1:1.3.4-6
> > > > Severity: important
> > > >
> > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64
> > > > GNU/Linux
> > > >
> > > > -- Description
> > > > After updating and or creating new files on our file server via
> > > > rsync, we see many files report the error message below from NFSv4
> > > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > > Clearing the dentry cache resolves the issue right away.
> > > > I am not sure that nfs-common is the package to blame, but listed
> > > > it based on the bug submission recommendations.
> > >
> > > The NFS implementation is mostly in the kernel, so probably this issue
> > > belongs there.  But the kernel team is responsible for both packages.
> > >
> > > [...]
> > > > -- Error message
> > > > ls: cannot access 'filename': No such file or directory
> > > > -? ? ???? filename
> > > [...]
> > >
> > > So we know the file's there but can't stat it.  I think this means the
> > > client has cached the handle of the old file of that name, which has
> > > been deleted.
> > >
> > > - Are client and server clocks closely synchronised?  If not, that
> > > needs to be fixed.
> > >
> > The clocks are synchronized using NTP.
> >
> > > - Are clients likely to read this directory while rsync is running, or
> > > shortly before?  If so, it may help to reduce the attribute caching
> > > timeout on the client.  See the "Directory entry caching" section in
> > > the nfs(5) manual page.
> > >
> > Clients are not likely to read this directory while rsync is running for the
> > observed cases.  That can happen in our environment, but not in this case.
> > I am using the lookupcache=pos option.  I tried noac, but the performance
> > penalty was too much.  Which option are you referring to and what setting
> > do you recommend testing?
> >
> > > I don't know why you're only seeing this after an upgrade of the
> > > clients, though.  I'm not aware that there has been any big change to
> > > attribute caching.
> > >
> > I appreciate you respond

Bug#1017720: nfs-common: No such file or directory

2022-08-23 Thread Jason Breitman
What additional information can I provide for us to move forward with this 
process?

To summarize and include further details, rsync is used to sync applications to 
a file server which behaves like a repository.
We do preserve timestamps from the build server and also use --delete.  We do 
not run the applications from the file server.  All servers use NTP.

The application has a sub-directory that contain files with version numbers.  
These are libraries.
When a new build is complete, a developer pushes their updates via rsync to the 
file server / repository.

I believe that the dentry cache thinks the "old" files exist and generates a No 
such file or directory error showing question marks for that files attributes.
Dropping the dentry cache via echo 2 > /proc/sys/vm/drop_caches resolves the 
issue. 

This behavior is not observed in Debian 10.8 with that distributions associated 
kernel and packages.

> -Original Message-
> From: Jason Breitman
> Sent: Friday, August 19, 2022 9:52 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> > -Original Message-
> > From: Ben Hutchings 
> > Sent: Friday, August 19, 2022 7:27 PM
> > To: Jason Breitman ;
> > 1017...@bugs.debian.org
> > Subject: Re: Bug#1017720: nfs-common: No such file or directory
> >
> > Control: tag -1 moreinfo
> >
> > On Fri, 2022-08-19 at 13:16 +, Jason Breitman wrote:
> > > Package: nfs-common
> > > Version: 1:1.3.4-6
> > > Severity: important
> > >
> > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64
> > > GNU/Linux
> > >
> > > -- Description
> > > After updating and or creating new files on our file server via
> > > rsync, we see many files report the error message below from NFSv4
> > > clients since upgrading from Debian 10.8 to Debian 11.4.
> > > Clearing the dentry cache resolves the issue right away.
> > > I am not sure that nfs-common is the package to blame, but listed
> > > it based on the bug submission recommendations.
> >
> > The NFS implementation is mostly in the kernel, so probably this issue
> > belongs there.  But the kernel team is responsible for both packages.
> >
> > [...]
> > > -- Error message
> > > ls: cannot access 'filename': No such file or directory
> > > -? ? ???? filename
> > [...]
> >
> > So we know the file's there but can't stat it.  I think this means the
> > client has cached the handle of the old file of that name, which has
> > been deleted.
> >
> > - Are client and server clocks closely synchronised?  If not, that
> > needs to be fixed.
> >
> The clocks are synchronized using NTP.
> 
> > - Are clients likely to read this directory while rsync is running, or
> > shortly before?  If so, it may help to reduce the attribute caching
> > timeout on the client.  See the "Directory entry caching" section in
> > the nfs(5) manual page.
> >
> Clients are not likely to read this directory while rsync is running for the
> observed cases.  That can happen in our environment, but not in this case.
> I am using the lookupcache=pos option.  I tried noac, but the performance
> penalty was too much.  Which option are you referring to and what setting
> do you recommend testing?
> 
> > I don't know why you're only seeing this after an upgrade of the
> > clients, though.  I'm not aware that there has been any big change to
> > attribute caching.
> >
> I appreciate you responding to my report and am happy to answer any
> questions.
> We have multiple monitors and log scrapers to detect "file not found"
> exceptions that would let us know if this was happening before.
> To share more, I have 2 environments mounting from the same file server.
> Each environment has several servers.  The issue is only seen in the
> environment running Debian 11.4.
> I also should have mentioned that the files in question have a version
> number appended.  filename-.  When the file is updated via rsync, it is
> called filename-1112 and the prior file is removed.  The error is about
> filename-.
> I am not sure if this is the proper terminology, but the issue appears to be
> the negative dentry cache.
> 
> > Ben.
> >
> > --
> > Ben Hutchings
> > Beware of bugs in the above code;
> > I have only proved it correct, not tried it. - Donald Knuth
> 
> Jason Breitman
Jason Breitman


Bug#1017720: nfs-common: No such file or directory

2022-08-19 Thread Jason Breitman
> -Original Message-
> From: Ben Hutchings 
> Sent: Friday, August 19, 2022 7:27 PM
> To: Jason Breitman ;
> 1017...@bugs.debian.org
> Subject: Re: Bug#1017720: nfs-common: No such file or directory
> 
> Control: tag -1 moreinfo
> 
> On Fri, 2022-08-19 at 13:16 +, Jason Breitman wrote:
> > Package: nfs-common
> > Version: 1:1.3.4-6
> > Severity: important
> >
> > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64
> > GNU/Linux
> >
> > -- Description
> > After updating and or creating new files on our file server via
> > rsync, we see many files report the error message below from NFSv4
> > clients since upgrading from Debian 10.8 to Debian 11.4.
> > Clearing the dentry cache resolves the issue right away.
> > I am not sure that nfs-common is the package to blame, but listed
> > it based on the bug submission recommendations.
> 
> The NFS implementation is mostly in the kernel, so probably this issue
> belongs there.  But the kernel team is responsible for both packages.
> 
> [...]
> > -- Error message
> > ls: cannot access 'filename': No such file or directory
> > -? ? ???? filename
> [...]
> 
> So we know the file's there but can't stat it.  I think this means the
> client has cached the handle of the old file of that name, which has
> been deleted.
> 
> - Are client and server clocks closely synchronised?  If not, that
> needs to be fixed.
> 
The clocks are synchronized using NTP.  

> - Are clients likely to read this directory while rsync is running, or
> shortly before?  If so, it may help to reduce the attribute caching
> timeout on the client.  See the "Directory entry caching" section in
> the nfs(5) manual page.
>
Clients are not likely to read this directory while rsync is running for the 
observed cases.  That can happen in our environment, but not in this case.
I am using the lookupcache=pos option.  I tried noac, but the performance 
penalty was too much.  Which option are you referring to and what setting do 
you recommend testing?

> I don't know why you're only seeing this after an upgrade of the
> clients, though.  I'm not aware that there has been any big change to
> attribute caching.
> 
I appreciate you responding to my report and am happy to answer any questions.
We have multiple monitors and log scrapers to detect "file not found" 
exceptions that would let us know if this was happening before.
To share more, I have 2 environments mounting from the same file server.  Each 
environment has several servers.  The issue is only seen in the environment 
running Debian 11.4.
I also should have mentioned that the files in question have a version number 
appended.  filename-.  When the file is updated via rsync, it is called 
filename-1112 and the prior file is removed.  The error is about filename-.
I am not sure if this is the proper terminology, but the issue appears to be 
the negative dentry cache.

> Ben.
> 
> --
> Ben Hutchings
> Beware of bugs in the above code;
> I have only proved it correct, not tried it. - Donald Knuth

Jason Breitman


Bug#1017720: nfs-common: No such file or directory

2022-08-19 Thread Ben Hutchings
Control: tag -1 moreinfo

On Fri, 2022-08-19 at 13:16 +, Jason Breitman wrote:
> Package: nfs-common
> Version: 1:1.3.4-6
> Severity: important
> 
> Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64
> GNU/Linux
> 
> -- Description
> After updating and or creating new files on our file server via
> rsync, we see many files report the error message below from NFSv4
> clients since upgrading from Debian 10.8 to Debian 11.4.
> Clearing the dentry cache resolves the issue right away.
> I am not sure that nfs-common is the package to blame, but listed
> it based on the bug submission recommendations. 

The NFS implementation is mostly in the kernel, so probably this issue
belongs there.  But the kernel team is responsible for both packages.

[...]
> -- Error message
> ls: cannot access 'filename': No such file or directory
> -? ? ???? filename
[...]

So we know the file's there but can't stat it.  I think this means the
client has cached the handle of the old file of that name, which has
been deleted.

- Are client and server clocks closely synchronised?  If not, that
needs to be fixed.

- Are clients likely to read this directory while rsync is running, or
shortly before?  If so, it may help to reduce the attribute caching
timeout on the client.  See the "Directory entry caching" section in
the nfs(5) manual page.

I don't know why you're only seeing this after an upgrade of the
clients, though.  I'm not aware that there has been any big change to
attribute caching.

Ben.

-- 
Ben Hutchings
Beware of bugs in the above code;
I have only proved it correct, not tried it. - Donald Knuth


signature.asc
Description: This is a digitally signed message part


Bug#1017720: nfs-common: No such file or directory

2022-08-19 Thread Jason Breitman
Package: nfs-common
Version: 1:1.3.4-6
Severity: important

Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

-- Description
After updating and or creating new files on our file server via rsync, we 
see many files report the error message below from NFSv4 clients since 
upgrading from Debian 10.8 to Debian 11.4.
Clearing the dentry cache resolves the issue right away.
I am not sure that nfs-common is the package to blame, but listed it based 
on the bug submission recommendations. 

-- Test
ls -l /mnt/dir/someOtherDir/* | grep '?'

-- Error message
ls: cannot access 'filename': No such file or directory
-? ? ???? filename

-- Workaround
/usr/bin/sync && echo 2 > /proc/sys/vm/drop_caches

-- /etc/fstab snippet --
nfs-server.domain.com:/dir  /mnt/dirnfs4
lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=10485760   0

Jason Breitman