Bug#855632: Bug#883217: linux: open on NFSv4 exported file on nfs server: "Resource temporarily unavailable" under reproducible conditions when client has granted read delegation on file

2017-12-15 Thread Salvatore Bonaccorso
Hi Stephen,

On Thu, Dec 14, 2017 at 05:51:12PM -0700, Stephen Dowdy wrote:
> 
> 
> On 12/14/2017 12:51 PM, Salvatore Bonaccorso wrote:
> > Hi Stephen,
> > 
> > On Mon, Dec 04, 2017 at 09:24:55PM +0100, Salvatore Bonaccorso wrote:
> >> Hi
> >>
> >> On Thu, Nov 30, 2017 at 03:35:40PM -0700, Stephen Dowdy wrote:
> >>> On 11/30/2017 01:39 PM, Salvatore Bonaccorso wrote:
>  Is this worth trying to be fixed for the jessie kernel?
> >>>
> >>> Salvatore,
> >>>
> >>> I believe this is likely the reason for my bug report:
> >>>
> >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855632
> >>>
> >>> as that system has thrown EAGAIN errors since i installed it in April, 
> >>> 2015.
> >>> It's a 10 NIC NFS server for the department, and often throws the error 
> >>> when i update files that are likely being read/open by client systems.
> >>> (it doesn't have a huge resource consumption load ever and i get that 
> >>> failure)
> >>>
> >>> So, i vote yeah ;)
> >>
> >> Okay.
> > 
> > Did you got a chance to test this as well for your case of #855632?
> > 
> > Regards,
> > Salvatore
> > 
> Salvatore,
> 
> 
> Sorry i didn't respond.  things have been way crazy.  Unfortunately, i 
> probably won't be able to test because:
>- problem is not reproducible easily sometimes
>- this machine services several hundred systems w/o any upcoming scheduled 
> downtime.
> 
> I haven't noticed the problem on any other machines we have, though, so don't 
> have any other candidates for testing.

Many thanks for the reply now, is much appreciated to see were we
stand. Yes I can perfectly understand the reasoning. the change is now
pending for 3.16.51-4 (or any later interation via a point release of
jessie), so if it happens to you to not have updated to stretch yet
and see your issue resolved as well we can close the second bug.

Regards,
Salvatore



Bug#855632: Bug#883217: linux: open on NFSv4 exported file on nfs server: "Resource temporarily unavailable" under reproducible conditions when client has granted read delegation on file

2017-12-14 Thread Stephen Dowdy


On 12/14/2017 12:51 PM, Salvatore Bonaccorso wrote:
> Hi Stephen,
> 
> On Mon, Dec 04, 2017 at 09:24:55PM +0100, Salvatore Bonaccorso wrote:
>> Hi
>>
>> On Thu, Nov 30, 2017 at 03:35:40PM -0700, Stephen Dowdy wrote:
>>> On 11/30/2017 01:39 PM, Salvatore Bonaccorso wrote:
 Is this worth trying to be fixed for the jessie kernel?
>>>
>>> Salvatore,
>>>
>>> I believe this is likely the reason for my bug report:
>>>
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855632
>>>
>>> as that system has thrown EAGAIN errors since i installed it in April, 2015.
>>> It's a 10 NIC NFS server for the department, and often throws the error 
>>> when i update files that are likely being read/open by client systems.
>>> (it doesn't have a huge resource consumption load ever and i get that 
>>> failure)
>>>
>>> So, i vote yeah ;)
>>
>> Okay.
> 
> Did you got a chance to test this as well for your case of #855632?
> 
> Regards,
> Salvatore
> 
Salvatore,


Sorry i didn't respond.  things have been way crazy.  Unfortunately, i probably 
won't be able to test because:
   - problem is not reproducible easily sometimes
   - this machine services several hundred systems w/o any upcoming scheduled 
downtime.

I haven't noticed the problem on any other machines we have, though, so don't 
have any other candidates for testing.

I may just take the "upgrade to stretch" solution out of this when i have some 
scheduled downtime.

thanks,
--stephen



Bug#855632: Bug#883217: linux: open on NFSv4 exported file on nfs server: "Resource temporarily unavailable" under reproducible conditions when client has granted read delegation on file

2017-12-14 Thread Salvatore Bonaccorso
Hi Stephen,

On Mon, Dec 04, 2017 at 09:24:55PM +0100, Salvatore Bonaccorso wrote:
> Hi
> 
> On Thu, Nov 30, 2017 at 03:35:40PM -0700, Stephen Dowdy wrote:
> > On 11/30/2017 01:39 PM, Salvatore Bonaccorso wrote:
> > > Is this worth trying to be fixed for the jessie kernel?
> > 
> > Salvatore,
> > 
> > I believe this is likely the reason for my bug report:
> > 
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855632
> > 
> > as that system has thrown EAGAIN errors since i installed it in April, 2015.
> > It's a 10 NIC NFS server for the department, and often throws the error 
> > when i update files that are likely being read/open by client systems.
> > (it doesn't have a huge resource consumption load ever and i get that 
> > failure)
> > 
> > So, i vote yeah ;)
> 
> Okay.

Did you got a chance to test this as well for your case of #855632?

Regards,
Salvatore



Bug#855632: Bug#883217: linux: open on NFSv4 exported file on nfs server: "Resource temporarily unavailable" under reproducible conditions when client has granted read delegation on file

2017-12-04 Thread Salvatore Bonaccorso
Hi

On Thu, Nov 30, 2017 at 03:35:40PM -0700, Stephen Dowdy wrote:
> On 11/30/2017 01:39 PM, Salvatore Bonaccorso wrote:
> > Is this worth trying to be fixed for the jessie kernel?
> 
> Salvatore,
> 
> I believe this is likely the reason for my bug report:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855632
> 
> as that system has thrown EAGAIN errors since i installed it in April, 2015.
> It's a 10 NIC NFS server for the department, and often throws the error when 
> i update files that are likely being read/open by client systems.
> (it doesn't have a huge resource consumption load ever and i get that failure)
> 
> So, i vote yeah ;)

Okay.

I tried to track that further down, and attached 

0001-locks-remove-i_have_this_lease-check-from-__break_le.patch
0002-locks-__break_lease-cleanup-in-preparation-of-allowi.patch

to be applied on top of the current jessie branch in git.

Attached are the two individual patches:

locks-remove-i_have_this_lease-check-from-__break_le.patch
locks-__break_lease-cleanup-in-preparation-of-allowi.patch

With these two patches applied I was not able to reproduce the problem
now for a while, whereas previously it was relatively fast
triggerable.

Can you confirm the issue would be addressed as well for you?
See the kernel-handbook for the simple-patching guideline:
https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s4.2.2

Still since the patches were integrated in a bigger rewrite/touching
of fs/locks.c, fs/nfsd this might need a proper/deeper review if that
is complete and does not break anything.

Regards,
Salvatore
>From 6997c3a97579e46cb839c334b4b9b6f96c3b573b Mon Sep 17 00:00:00 2001
From: Salvatore Bonaccorso 
Date: Mon, 4 Dec 2017 11:11:28 +0100
Subject: [PATCH 1/2] locks: remove i_have_this_lease check from __break_lease

---
 debian/changelog   |  6 +++
 ...e-i_have_this_lease-check-from-__break_le.patch | 54 ++
 debian/patches/series  |  1 +
 3 files changed, 61 insertions(+)
 create mode 100644 debian/patches/bugfix/all/locks-remove-i_have_this_lease-check-from-__break_le.patch

diff --git a/debian/changelog b/debian/changelog
index 977e1cea3..955b86f56 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+linux (3.16.51-3) UNRELEASED; urgency=medium
+
+  * locks: remove i_have_this_lease check from __break_lease
+
+ -- Salvatore Bonaccorso   Mon, 04 Dec 2017 12:17:53 +0100
+
 linux (3.16.51-2) jessie; urgency=medium
 
   * [mips*] inst: Avoid ABI change in 3.16.51
diff --git a/debian/patches/bugfix/all/locks-remove-i_have_this_lease-check-from-__break_le.patch b/debian/patches/bugfix/all/locks-remove-i_have_this_lease-check-from-__break_le.patch
new file mode 100644
index 0..04a778b40
--- /dev/null
+++ b/debian/patches/bugfix/all/locks-remove-i_have_this_lease-check-from-__break_le.patch
@@ -0,0 +1,54 @@
+From: Jeff Layton 
+Date: Mon, 1 Sep 2014 14:27:43 -0400
+Subject: locks: remove i_have_this_lease check from __break_lease
+Origin: https://git.kernel.org/linus/843c6b2f4cef384af8e0de6b7ac7191675030e3a
+
+I think that the intent of this code was to ensure that a process won't
+deadlock if it has one fd open with a lease on it and then breaks that
+lease by opening another fd. In that case it'll treat the __break_lease
+call as if it were non-blocking.
+
+This seems wrong -- the process could (for instance) be multithreaded
+and managing different fds via different threads. I also don't see any
+mention of this limitation in the (somewhat sketchy) documentation.
+
+Remove the check and the non-blocking behavior when i_have_this_lease
+is true.
+
+Signed-off-by: Jeff Layton 
+[carnil: Backport for 3.16:
+ - adjust context
+]
+---
+ fs/locks.c | 6 ++
+ 1 file changed, 2 insertions(+), 4 deletions(-)
+
+--- a/fs/locks.c
 b/fs/locks.c
+@@ -1326,7 +1326,6 @@ int __break_lease(struct inode *inode, u
+ 	struct file_lock *new_fl, *flock;
+ 	struct file_lock *fl;
+ 	unsigned long break_time;
+-	int i_have_this_lease = 0;
+ 	bool lease_conflict = false;
+ 	int want_write = (mode & O_ACCMODE) != O_RDONLY;
+ 
+@@ -1346,8 +1345,7 @@ int __break_lease(struct inode *inode, u
+ 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
+ 		if (leases_conflict(fl, new_fl)) {
+ 			lease_conflict = true;
+-			if (fl->fl_owner == current->files)
+-i_have_this_lease = 1;
++			break;
+ 		}
+ 	}
+ 	if (!lease_conflict)
+@@ -1377,7 +1375,7 @@ int __break_lease(struct inode *inode, u
+ 		fl->fl_lmops->lm_break(fl);
+ 	}
+ 
+-	if (i_have_this_lease || (mode & O_NONBLOCK)) {
++	if (mode & O_NONBLOCK) {
+ 		trace_break_lease_noblock(inode, new_fl);
+ 		error = -EWOULDBLOCK;
+ 		goto out;
diff --git a/debian/patches/series b/debian/patches/series
index 4cd4a739c..4ab96adb2 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -251,6 +251,7 @@