[Cluster-devel] [PATCH dlm/next 0/3] fs: dlm: debug and tracepoints

2021-08-20 Thread Alexander Aring
Hi,

again a resend of the tracepoints patches and fixed that the lkb
reference is hold during tracepoint call. We need to do that because we
access the structure during tracing. There is some code duplication to
convert the return error for dlm_lock_end trace now. I didn't wanted to
change the existing code and introduce more overhead if tracing is not
enabled.

- Alex

Alexander Aring (3):
  fs: dlm: debug improvements print nodeid
  fs: dlm: initial support for tracepoints
  fs: dlm: trace socket handling

 fs/dlm/ast.c   |   4 +
 fs/dlm/lock.c  |  10 ++
 fs/dlm/lowcomms.c  |   4 +
 fs/dlm/main.c  |   3 +
 fs/dlm/midcomms.c  |   4 +-
 include/trace/events/dlm.h | 260 +
 6 files changed, 283 insertions(+), 2 deletions(-)
 create mode 100644 include/trace/events/dlm.h

-- 
2.27.0



[Cluster-devel] [PATCH dlm/next 2/3] fs: dlm: initial support for tracepoints

2021-08-20 Thread Alexander Aring
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.

The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.

I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.

Signed-off-by: Alexander Aring 
---
 fs/dlm/ast.c   |   4 +
 fs/dlm/lock.c  |  10 ++
 fs/dlm/main.c  |   3 +
 include/trace/events/dlm.h | 220 +
 4 files changed, 237 insertions(+)
 create mode 100644 include/trace/events/dlm.h

diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 283c7b94edda..0e7c6f6819ac 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -9,6 +9,8 @@
 ***
 **/
 
+#include 
+
 #include "dlm_internal.h"
 #include "lock.h"
 #include "user.h"
@@ -254,10 +256,12 @@ void dlm_callback_work(struct work_struct *work)
continue;
} else if (callbacks[i].flags & DLM_CB_BAST) {
bastfn(lkb->lkb_astparam, callbacks[i].mode);
+   trace_dlm_bast(ls, lkb, callbacks[i].mode);
} else if (callbacks[i].flags & DLM_CB_CAST) {
lkb->lkb_lksb->sb_status = callbacks[i].sb_status;
lkb->lkb_lksb->sb_flags = callbacks[i].sb_flags;
castfn(lkb->lkb_astparam);
+   trace_dlm_ast(ls, lkb, lkb->lkb_lksb);
}
}
 
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index c502c065d007..feb2e94f5879 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -53,6 +53,8 @@
R: do_()
L: receive__reply() <-  R: send__reply()
 */
+#include 
+
 #include 
 #include 
 #include 
@@ -3437,6 +3439,8 @@ int dlm_lock(dlm_lockspace_t *lockspace,
if (error)
goto out;
 
+   trace_dlm_lock_start(ls, lkb, mode, flags);
+
error = set_lock_args(mode, lksb, flags, namelen, 0, ast,
  astarg, bast, );
if (error)
@@ -3450,6 +3454,8 @@ int dlm_lock(dlm_lockspace_t *lockspace,
if (error == -EINPROGRESS)
error = 0;
  out_put:
+   trace_dlm_lock_end(ls, lkb, mode, flags, error);
+
if (convert || error)
__put_lkb(ls, lkb);
if (error == -EAGAIN || error == -EDEADLK)
@@ -3481,6 +3487,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace,
if (error)
goto out;
 
+   trace_dlm_unlock_start(ls, lkb, flags);
+
error = set_unlock_args(flags, astarg, );
if (error)
goto out_put;
@@ -3495,6 +3503,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace,
if (error == -EBUSY && (flags & (DLM_LKF_CANCEL | DLM_LKF_FORCEUNLOCK)))
error = 0;
  out_put:
+   trace_dlm_unlock_end(ls, lkb, flags, error);
+
dlm_put_lkb(lkb);
  out:
dlm_unlock_recovery(ls);
diff --git a/fs/dlm/main.c b/fs/dlm/main.c
index afc66a1346d3..1c5be4b70ac1 100644
--- a/fs/dlm/main.c
+++ b/fs/dlm/main.c
@@ -19,6 +19,9 @@
 #include "config.h"
 #include "lowcomms.h"
 
+#define CREATE_TRACE_POINTS
+#include 
+
 static int __init init_dlm(void)
 {
int error;
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
new file mode 100644
index ..cc2dd79f4ec7
--- /dev/null
+++ b/include/trace/events/dlm.h
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dlm
+
+#if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DLM_H
+
+#include 
+#include 
+#include 
+
+#include "../../../fs/dlm/dlm_internal.h"
+
+#define show_lock_flags(flags) __print_flags(flags, "|",   \
+   { 

[Cluster-devel] [PATCH dlm/next 3/3] fs: dlm: trace socket handling

2021-08-20 Thread Alexander Aring
This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.

Signed-off-by: Alexander Aring 
---
 fs/dlm/lowcomms.c  |  4 
 include/trace/events/dlm.h | 40 ++
 2 files changed, 44 insertions(+)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 8f715c620e1f..e5438f2e586a 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -53,6 +53,8 @@
 #include 
 #include 
 
+#include 
+
 #include "dlm_internal.h"
 #include "lowcomms.h"
 #include "midcomms.h"
@@ -925,6 +927,7 @@ static int receive_from_sock(struct connection *con)
msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
ret = kernel_recvmsg(con->sock, , , 1, iov.iov_len,
 msg.msg_flags);
+   trace_dlm_recv(con->nodeid, ret);
if (ret == -EAGAIN)
break;
else if (ret <= 0)
@@ -1411,6 +1414,7 @@ static void send_to_sock(struct connection *con)
 
ret = kernel_sendpage(con->sock, e->page, offset, len,
  msg_flags);
+   trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
if (ret == -EAGAIN &&
test_bit(SOCKWQ_ASYNC_NOSPACE, >sock->flags) &&
diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h
index cc2dd79f4ec7..9a2e3d869363 100644
--- a/include/trace/events/dlm.h
+++ b/include/trace/events/dlm.h
@@ -214,6 +214,46 @@ TRACE_EVENT(dlm_unlock_end,
 
 );
 
+TRACE_EVENT(dlm_send,
+
+   TP_PROTO(int nodeid, int ret),
+
+   TP_ARGS(nodeid, ret),
+
+   TP_STRUCT__entry(
+   __field(int, nodeid)
+   __field(int, ret)
+   ),
+
+   TP_fast_assign(
+   __entry->nodeid = nodeid;
+   __entry->ret = ret;
+   ),
+
+   TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret)
+
+);
+
+TRACE_EVENT(dlm_recv,
+
+   TP_PROTO(int nodeid, int ret),
+
+   TP_ARGS(nodeid, ret),
+
+   TP_STRUCT__entry(
+   __field(int, nodeid)
+   __field(int, ret)
+   ),
+
+   TP_fast_assign(
+   __entry->nodeid = nodeid;
+   __entry->ret = ret;
+   ),
+
+   TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret)
+
+);
+
 #endif /* if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ) */
 
 /* This part must be outside protection */
-- 
2.27.0



[Cluster-devel] [PATCH dlm/next 1/3] fs: dlm: debug improvements print nodeid

2021-08-20 Thread Alexander Aring
This patch improves the debug output for midcomms layer by also printing
out the nodeid where users counter belongs to.

Signed-off-by: Alexander Aring 
---
 fs/dlm/midcomms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index 7ae39ec8d9b0..008078f06813 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -1231,7 +1231,7 @@ void dlm_midcomms_add_member(int nodeid)
}
 
node->users++;
-   pr_debug("users inc count %d\n", node->users);
+   pr_debug("node %d users inc count %d\n", nodeid, node->users);
spin_unlock(>state_lock);
 
srcu_read_unlock(_srcu, idx);
@@ -1254,7 +1254,7 @@ void dlm_midcomms_remove_member(int nodeid)
 
spin_lock(>state_lock);
node->users--;
-   pr_debug("users dec count %d\n", node->users);
+   pr_debug("node %d users dec count %d\n", nodeid, node->users);
 
/* hitting users count to zero means the
 * other side is running dlm_midcomms_stop()
-- 
2.27.0



Re: [Cluster-devel] [PATCH v3 2/2] fs: remove mandatory file locking support

2021-08-20 Thread Al Viro
On Fri, Aug 20, 2021 at 12:39:19PM -0400, Jeff Layton wrote:

> diff --git a/fs/locks.c b/fs/locks.c

> @@ -2857,8 +2744,7 @@ static void lock_get_status(struct seq_file *f, struct 
> file_lock *fl,
>   seq_puts(f, "POSIX ");
>  
>   seq_printf(f, " %s ",
> -  (inode == NULL) ? "*NOINODE*" :
> -  mandatory_lock(inode) ? "MANDATORY" : "ADVISORY ");
> +  (inode == NULL) ? "*NOINODE*" : "ADVISORY ");

Huh?


if (fl->fl_file != NULL)
inode = locks_inode(fl->fl_file);

So basically that's fl->fl_file ? "ADVISORY" : "*NOINODE*"?

How could that trigger, though?  With locks_mandatory_area() gone, I don't
see how FL_POSIX file_lock with NULL ->fl_file could be ever found...
Confused...

Why is locks_copy_conflock() exported (hell, non-static), BTW?

> diff --git a/fs/namespace.c b/fs/namespace.c


> -#ifdef   CONFIG_MANDATORY_FILE_LOCKING
> -static bool may_mandlock(void)
> +static void warn_mandlock(void)
>  {
> - pr_warn_once("==\n"
> -  "WARNING: the mand mount option is being deprecated and\n"
> -  " will be removed in v5.15!\n"
> -  
> "==\n");
> - return capable(CAP_SYS_ADMIN);
> + pr_warn_once("===\n"
> +  "WARNING: The mand mount option has been deprecated and\n"
> +  " and is ignored by this kernel. Remove the mand\n"
> +  " option from the mount to silence this warning.\n"
> +  
> "===\n");
>  }
> -#else
> -static inline bool may_mandlock(void)
> -{
> - pr_warn("VFS: \"mand\" mount option not supported");
> - return false;
> -}
> -#endif

Is there any point in having the previous patch not folded into this one?



Re: [Cluster-devel] [PATCH v3 0/2] fs: remove support for mandatory locking

2021-08-20 Thread Jeff Layton
On Fri, 2021-08-20 at 12:39 -0400, Jeff Layton wrote:
> v3: slight revision to verbiage, and use pr_warn_once
> 
> The first patch in this series adds a new warning that should pop on
> kernels that have mandatory locking enabled when someone mounts a
> filesystem with -o mand. The second patch removes support for mandatory
> locking altogether.
> 
> What I think we probably want to do is apply the first to v5.14 before
> it ships and allow the new warning to trickle out into stable kernels.
> Then we can merge the second patch in v5.15 to go ahead and remove it.
> 
> Sound like a plan?
> 
> Jeff Layton (2):
>   fs: warn about impending deprecation of mandatory locks
>   fs: remove mandatory file locking support
> 
>  .../filesystems/mandatory-locking.rst | 188 --
>  fs/9p/vfs_file.c  |  12 --
>  fs/Kconfig|  10 -
>  fs/afs/flock.c|   4 -
>  fs/ceph/locks.c   |   3 -
>  fs/gfs2/file.c|   3 -
>  fs/locks.c| 116 +--
>  fs/namei.c|   4 +-
>  fs/namespace.c|  25 +--
>  fs/nfs/file.c |   4 -
>  fs/nfsd/nfs4state.c   |  13 --
>  fs/nfsd/vfs.c |  15 --
>  fs/ocfs2/locks.c  |   4 -
>  fs/open.c |   8 +-
>  fs/read_write.c   |   7 -
>  fs/remap_range.c  |  10 -
>  include/linux/fs.h|  84 
>  mm/mmap.c |   6 -
>  mm/nommu.c|   3 -
>  19 files changed, 14 insertions(+), 505 deletions(-)
>  delete mode 100644 Documentation/filesystems/mandatory-locking.rst
> 

I went ahead and pushed this version into the locks-next branch, so we
can give it some soak time before merging.

-- 
Jeff Layton 



[Cluster-devel] [PATCH v3 2/2] fs: remove mandatory file locking support

2021-08-20 Thread Jeff Layton
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton 
---
 .../filesystems/mandatory-locking.rst | 188 --
 fs/9p/vfs_file.c  |  12 --
 fs/Kconfig|  10 -
 fs/afs/flock.c|   4 -
 fs/ceph/locks.c   |   3 -
 fs/gfs2/file.c|   3 -
 fs/locks.c| 116 +--
 fs/namei.c|   4 +-
 fs/namespace.c|  29 +--
 fs/nfs/file.c |   4 -
 fs/nfsd/nfs4state.c   |  13 --
 fs/nfsd/vfs.c |  15 --
 fs/ocfs2/locks.c  |   4 -
 fs/open.c |   8 +-
 fs/read_write.c   |   7 -
 fs/remap_range.c  |  10 -
 include/linux/fs.h|  84 
 mm/mmap.c |   6 -
 mm/nommu.c|   3 -
 19 files changed, 14 insertions(+), 509 deletions(-)
 delete mode 100644 Documentation/filesystems/mandatory-locking.rst

diff --git a/Documentation/filesystems/mandatory-locking.rst 
b/Documentation/filesystems/mandatory-locking.rst
deleted file mode 100644
index 9ce73544a8f0..
--- a/Documentation/filesystems/mandatory-locking.rst
+++ /dev/null
@@ -1,188 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=
-Mandatory File Locking For The Linux Operating System
-=
-
-   Andy Walker 
-
-  15 April 1996
-
-(Updated September 2007)
-
-0. Why you should avoid mandatory locking
--
-
-The Linux implementation is prey to a number of difficult-to-fix race
-conditions which in practice make it not dependable:
-
-   - The write system call checks for a mandatory lock only once
- at its start.  It is therefore possible for a lock request to
- be granted after this check but before the data is modified.
- A process may then see file data change even while a mandatory
- lock was held.
-   - Similarly, an exclusive lock may be granted on a file after
- the kernel has decided to proceed with a read, but before the
- read has actually completed, and the reading process may see
- the file data in a state which should not have been visible
- to it.
-   - Similar races make the claimed mutual exclusion between lock
- and mmap similarly unreliable.
-
-1. What is  mandatory locking?
---
-
-Mandatory locking is kernel enforced file locking, as opposed to the more usual
-cooperative file locking used to guarantee sequential access to files among
-processes. File locks are applied using the flock() and fcntl() system calls
-(and the lockf() library routine which is a wrapper around fcntl().) It is
-normally a process' responsibility to check for locks on a file it wishes to
-update, before applying its own lock, updating the file and unlocking it again.
-The most commonly used example of this (and in the case of sendmail, the most
-troublesome) is access to a user's mailbox. The mail user agent and the mail
-transfer agent must guard against updating the mailbox at the same time, and
-prevent reading the mailbox while it is being updated.
-
-In a perfect world all processes would use and honour a cooperative, or
-"advisory" locking scheme. However, the world isn't perfect, and there's
-a lot of poorly written code out there.
-
-In trying to address this problem, the designers of System V UNIX came up
-with a "mandatory" locking scheme, whereby the operating system kernel would
-block attempts by a process to write to a file that another process holds a
-"read" -or- "shared" lock on, and block attempts to both read and write to a 
-file that a process holds a "write " -or- "exclusive" lock on.
-
-The System V mandatory locking scheme was intended to have as little impact as
-possible on existing user code. The scheme is based on marking 

[Cluster-devel] [PATCH v3 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread Jeff Layton
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
have disabled it. Warn the stragglers that still use "-o mand" that
we'll be dropping support for that mount option.

Cc: sta...@vger.kernel.org
Signed-off-by: Jeff Layton 
---
 fs/namespace.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ab4174a3c802..2279473d0d6f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1716,8 +1716,12 @@ static inline bool may_mount(void)
 }
 
 #ifdef CONFIG_MANDATORY_FILE_LOCKING
-static inline bool may_mandlock(void)
+static bool may_mandlock(void)
 {
+   pr_warn_once("==\n"
+"WARNING: the mand mount option is being deprecated and\n"
+" will be removed in v5.15!\n"
+
"==\n");
return capable(CAP_SYS_ADMIN);
 }
 #else
-- 
2.31.1



[Cluster-devel] [PATCH v3 0/2] fs: remove support for mandatory locking

2021-08-20 Thread Jeff Layton
v3: slight revision to verbiage, and use pr_warn_once

The first patch in this series adds a new warning that should pop on
kernels that have mandatory locking enabled when someone mounts a
filesystem with -o mand. The second patch removes support for mandatory
locking altogether.

What I think we probably want to do is apply the first to v5.14 before
it ships and allow the new warning to trickle out into stable kernels.
Then we can merge the second patch in v5.15 to go ahead and remove it.

Sound like a plan?

Jeff Layton (2):
  fs: warn about impending deprecation of mandatory locks
  fs: remove mandatory file locking support

 .../filesystems/mandatory-locking.rst | 188 --
 fs/9p/vfs_file.c  |  12 --
 fs/Kconfig|  10 -
 fs/afs/flock.c|   4 -
 fs/ceph/locks.c   |   3 -
 fs/gfs2/file.c|   3 -
 fs/locks.c| 116 +--
 fs/namei.c|   4 +-
 fs/namespace.c|  25 +--
 fs/nfs/file.c |   4 -
 fs/nfsd/nfs4state.c   |  13 --
 fs/nfsd/vfs.c |  15 --
 fs/ocfs2/locks.c  |   4 -
 fs/open.c |   8 +-
 fs/read_write.c   |   7 -
 fs/remap_range.c  |  10 -
 include/linux/fs.h|  84 
 mm/mmap.c |   6 -
 mm/nommu.c|   3 -
 19 files changed, 14 insertions(+), 505 deletions(-)
 delete mode 100644 Documentation/filesystems/mandatory-locking.rst

-- 
2.31.1



Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread Steven Rostedt
On Fri, 20 Aug 2021 17:52:19 +0200
David Hildenbrand  wrote:

> > +static bool warned_mand;
> >   static inline bool may_mandlock(void)
> >   {
> > +   if (!warned_mand) {
> > +   warned_mand = true;
> > +   
> > pr_warn("==\n");
> > +   pr_warn("WARNING: the mand mount option is being deprecated 
> > and\n");
> > +   pr_warn(" will be removed in v5.15!\n");
> > +   
> > pr_warn("==\n");
> > +   }  
> 
> Is there a reason not to use pr_warn_once() ?

You would need a single call though, otherwise each pr_warn_once()
would have its own state that it warned once.

const char warning[] =
"==\n"
"WARNING: the mand mount option is being deprecated and\n"
" will be removed in v5.15!\n"
"==\n";

pr_warn_once(warning);

-- Steve



Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread Jeff Layton
On Fri, 2021-08-20 at 17:52 +0200, David Hildenbrand wrote:
> On 20.08.21 15:57, Jeff Layton wrote:
> > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
> > have disabled it. Warn the stragglers that still use "-o mand" that
> > we'll be dropping support for that mount option.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Jeff Layton 
> > ---
> >   fs/namespace.c | 8 
> >   1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index ab4174a3c802..ffab0bb1e649 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
> >   }
> >   
> >   #ifdefCONFIG_MANDATORY_FILE_LOCKING
> > +static bool warned_mand;
> >   static inline bool may_mandlock(void)
> >   {
> > +   if (!warned_mand) {
> > +   warned_mand = true;
> > +   
> > pr_warn("==\n");
> > +   pr_warn("WARNING: the mand mount option is being deprecated 
> > and\n");
> > +   pr_warn(" will be removed in v5.15!\n");
> > +   
> > pr_warn("==\n");
> > +   }
> 
> Is there a reason not to use pr_warn_once() ?
> 
> 

No reason at all. I'll send out a v3 set in a bit with that change.

Thanks!
-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread David Laight
From: Jeff Layton
> Sent: 20 August 2021 14:57
> 
> We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
> have disabled it. Warn the stragglers that still use "-o mand" that
> we'll be dropping support for that mount option.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Jeff Layton 
> ---
>  fs/namespace.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ab4174a3c802..ffab0bb1e649 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
>  }
> 
>  #ifdef   CONFIG_MANDATORY_FILE_LOCKING
> +static bool warned_mand;
>  static inline bool may_mandlock(void)
>  {
> + if (!warned_mand) {
> + warned_mand = true;
> + 
> pr_warn("==\n");
> + pr_warn("WARNING: the mand mount option is being deprecated 
> and\n");
> + pr_warn(" will be removed in v5.15!\n");
> + 
> pr_warn("==\n");
> + }
>   return capable(CAP_SYS_ADMIN);
>  }

If that is called more than once you don't want the 'inline'.
I doubt it matters is not inlined - hardly a hot path.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread David Hildenbrand

On 20.08.21 15:57, Jeff Layton wrote:

We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
have disabled it. Warn the stragglers that still use "-o mand" that
we'll be dropping support for that mount option.

Cc: sta...@vger.kernel.org
Signed-off-by: Jeff Layton 
---
  fs/namespace.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index ab4174a3c802..ffab0bb1e649 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
  }
  
  #ifdef	CONFIG_MANDATORY_FILE_LOCKING

+static bool warned_mand;
  static inline bool may_mandlock(void)
  {
+   if (!warned_mand) {
+   warned_mand = true;
+   
pr_warn("==\n");
+   pr_warn("WARNING: the mand mount option is being deprecated 
and\n");
+   pr_warn(" will be removed in v5.15!\n");
+   
pr_warn("==\n");
+   }


Is there a reason not to use pr_warn_once() ?


--
Thanks,

David / dhildenb



Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread Jeff Layton
On Fri, 2021-08-20 at 15:49 +, David Laight wrote:
> From: Jeff Layton
> > Sent: 20 August 2021 14:57
> > 
> > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
> > have disabled it. Warn the stragglers that still use "-o mand" that
> > we'll be dropping support for that mount option.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/namespace.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index ab4174a3c802..ffab0bb1e649 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
> >  }
> > 
> >  #ifdef CONFIG_MANDATORY_FILE_LOCKING
> > +static bool warned_mand;
> >  static inline bool may_mandlock(void)
> >  {
> > +   if (!warned_mand) {
> > +   warned_mand = true;
> > +   
> > pr_warn("==\n");
> > +   pr_warn("WARNING: the mand mount option is being deprecated 
> > and\n");
> > +   pr_warn(" will be removed in v5.15!\n");
> > +   
> > pr_warn("==\n");
> > +   }
> > return capable(CAP_SYS_ADMIN);
> >  }
> 
> If that is called more than once you don't want the 'inline'.
> I doubt it matters is not inlined - hardly a hot path.
> 
>   David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)
> 

ACK. Of course. That really needs to not be inline. I'll fix that up in
my tree.

Thanks!
-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Andreas Gruenbacher
On Fri, Aug 20, 2021 at 3:11 PM Bob Peterson  wrote:
> On 8/20/21 4:35 AM, Steven Whitehouse wrote:
> > Hi,
> >
> > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
> >> From: Bob Peterson 
> >>
> >> This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure
> >> that
> >> will allow glocks to be demoted automatically on locking conflicts.
> >> When a locking request comes in that isn't compatible with the
> >> locking
> >> state of a holder and that holder has the HIF_MAY_DEMOTE flag set,
> >> the
> >> holder will be demoted automatically before the incoming locking
> >> request
> >> is granted.
> >>
> > I'm not sure I understand what is going on here. When there are locking
> > conflicts we generate call backs and those result in glock demotion.
> > There is no need for a flag to indicate that I think, since it is the
> > default behaviour anyway. Or perhaps the explanation is just a bit
> > confusing...
>
> I agree that the whole concept and explanation are confusing. Andreas
> and I went through several heated arguments about the semantics,
> comments, patch descriptions, etc. We played around with many different
> flag name ideas, etc. We did not agree on the best way to describe the
> whole concept. He didn't like my explanation and I didn't like his. So
> yes, it is confusing.
>
> My preferred terminology was "DOD" or "Dequeue On Demand"

... which is useless because it adds no clarity as to whose demand
we're talking about.

> which makes
> the concept more understandable to me. So basically a process can say
> "I need to hold this glock, but for an unknown and possibly lengthy
> period of time, but please feel free to dequeue it if it's in your way."
> And bear in mind that several processes may do the same, simultaneously.
>
> You can almost think of this as a performance enhancement. This concept
> allows a process to hold a glock for much longer periods of time, at a
> lower priority, for example, when gfs2_file_read_iter needs to hold the
> glock for very long-running iterative reads.

Consider a process that allocates a somewhat large buffer and reads
into it in chunks that are not page aligned. The buffer initially
won't be faulted in, so we fault in the first chunk and write into it.
Then, when reading the second chunk, we find that the first page of
the second chunk is already present. We fill it, set the
HIF_MAY_DEMOTE flag, fault in more pages, and clear the HIF_MAY_DEMOTE
flag. If we then still have the glock (which is very likely), we
resume the read. Otherwise, we return a short result.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-20 Thread Dan Williams
On Thu, Aug 19, 2021 at 9:12 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 19, 2021 at 02:25:52PM -0700, Dan Williams wrote:
> > Given most of the iomap_iter users don't care about srcmap, i.e. are
> > not COW cases, they are leaving srcmap zero initialized. Should the
> > IOMAP types be incremented by one so that there is no IOMAP_HOLE
> > confusion? In other words, fold something like this?
>
> A hole really means nothing to read from the source.  The existing code
> also relies on that.

Ok, I've since found iomap_iter_srcmap(). Sorry for the noise.



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Andreas Grünbacher
Am Fr., 20. Aug. 2021 um 15:49 Uhr schrieb Steven Whitehouse
:
> We always used to manage to avoid holding fs locks when copying to/from 
> userspace
> to avoid these complications.

I realize the intent, but that goal has never actually been achieved.
Direct I/O has *always* been calling get_user_pages() while holding
the inode glock in deferred mode.

The situation is slightly different for buffered I/O, but we have the
same problem there at least since switching to iomap. (And it's a
fundamental property of iomap that we want to hold the inode glock
across multi-page mappings, not an implementation deficit.)

Here's a slightly outdated version of a test case that makes all
versions of gfs2 prior to this patch queue unhappy:
https://lore.kernel.org/fstests/20210531152604.240462-1-agrue...@redhat.com/

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v2 0/2] fs: remove support for mandatory locking

2021-08-20 Thread J. Bruce Fields
On Fri, Aug 20, 2021 at 09:57:05AM -0400, Jeff Layton wrote:
> The first patch in this series adds a new warning that should pop on
> kernels have mandatory locking enabled when someone mounts a filesystem
> with -o mand. The second patch removes support for mandatory locking
> altogether.
> 
> What I think we probably want to do is apply the first to v5.14 before
> it ships and allow the new warning to trickle out into stable kernels.
> Then we can merge the second patch in v5.15 to go ahead and remove it.
> 
> Sound like a plan?

Sounds good to me.--b.

> 
> Jeff Layton (2):
>   fs: warn about impending deprecation of mandatory locks
>   fs: remove mandatory file locking support
> 
>  .../filesystems/mandatory-locking.rst | 188 --
>  fs/9p/vfs_file.c  |  12 --
>  fs/Kconfig|  10 -
>  fs/afs/flock.c|   4 -
>  fs/ceph/locks.c   |   3 -
>  fs/gfs2/file.c|   3 -
>  fs/locks.c| 116 +--
>  fs/namei.c|   4 +-
>  fs/namespace.c|  31 +--
>  fs/nfs/file.c |   4 -
>  fs/nfsd/nfs4state.c   |  13 --
>  fs/nfsd/vfs.c |  15 --
>  fs/ocfs2/locks.c  |   4 -
>  fs/open.c |   8 +-
>  fs/read_write.c   |   7 -
>  fs/remap_range.c  |  10 -
>  include/linux/fs.h|  84 
>  mm/mmap.c |   6 -
>  mm/nommu.c|   3 -
>  19 files changed, 20 insertions(+), 505 deletions(-)
>  delete mode 100644 Documentation/filesystems/mandatory-locking.rst
> 
> -- 
> 2.31.1



[Cluster-devel] [PATCH v2 2/2] fs: remove mandatory file locking support

2021-08-20 Thread Jeff Layton
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton 
---
 .../filesystems/mandatory-locking.rst | 188 --
 fs/9p/vfs_file.c  |  12 --
 fs/Kconfig|  10 -
 fs/afs/flock.c|   4 -
 fs/ceph/locks.c   |   3 -
 fs/gfs2/file.c|   3 -
 fs/locks.c| 116 +--
 fs/namei.c|   4 +-
 fs/namespace.c|  37 ++--
 fs/nfs/file.c |   4 -
 fs/nfsd/nfs4state.c   |  13 --
 fs/nfsd/vfs.c |  15 --
 fs/ocfs2/locks.c  |   4 -
 fs/open.c |   8 +-
 fs/read_write.c   |   7 -
 fs/remap_range.c  |  10 -
 include/linux/fs.h|  84 
 mm/mmap.c |   6 -
 mm/nommu.c|   3 -
 19 files changed, 19 insertions(+), 512 deletions(-)
 delete mode 100644 Documentation/filesystems/mandatory-locking.rst

diff --git a/Documentation/filesystems/mandatory-locking.rst 
b/Documentation/filesystems/mandatory-locking.rst
deleted file mode 100644
index 9ce73544a8f0..
--- a/Documentation/filesystems/mandatory-locking.rst
+++ /dev/null
@@ -1,188 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=
-Mandatory File Locking For The Linux Operating System
-=
-
-   Andy Walker 
-
-  15 April 1996
-
-(Updated September 2007)
-
-0. Why you should avoid mandatory locking
--
-
-The Linux implementation is prey to a number of difficult-to-fix race
-conditions which in practice make it not dependable:
-
-   - The write system call checks for a mandatory lock only once
- at its start.  It is therefore possible for a lock request to
- be granted after this check but before the data is modified.
- A process may then see file data change even while a mandatory
- lock was held.
-   - Similarly, an exclusive lock may be granted on a file after
- the kernel has decided to proceed with a read, but before the
- read has actually completed, and the reading process may see
- the file data in a state which should not have been visible
- to it.
-   - Similar races make the claimed mutual exclusion between lock
- and mmap similarly unreliable.
-
-1. What is  mandatory locking?
---
-
-Mandatory locking is kernel enforced file locking, as opposed to the more usual
-cooperative file locking used to guarantee sequential access to files among
-processes. File locks are applied using the flock() and fcntl() system calls
-(and the lockf() library routine which is a wrapper around fcntl().) It is
-normally a process' responsibility to check for locks on a file it wishes to
-update, before applying its own lock, updating the file and unlocking it again.
-The most commonly used example of this (and in the case of sendmail, the most
-troublesome) is access to a user's mailbox. The mail user agent and the mail
-transfer agent must guard against updating the mailbox at the same time, and
-prevent reading the mailbox while it is being updated.
-
-In a perfect world all processes would use and honour a cooperative, or
-"advisory" locking scheme. However, the world isn't perfect, and there's
-a lot of poorly written code out there.
-
-In trying to address this problem, the designers of System V UNIX came up
-with a "mandatory" locking scheme, whereby the operating system kernel would
-block attempts by a process to write to a file that another process holds a
-"read" -or- "shared" lock on, and block attempts to both read and write to a 
-file that a process holds a "write " -or- "exclusive" lock on.
-
-The System V mandatory locking scheme was intended to have as little impact as
-possible on existing user code. The scheme is based on marking 

[Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread Jeff Layton
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
have disabled it. Warn the stragglers that still use "-o mand" that
we'll be dropping support for that mount option.

Cc: sta...@vger.kernel.org
Signed-off-by: Jeff Layton 
---
 fs/namespace.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index ab4174a3c802..ffab0bb1e649 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
 }
 
 #ifdef CONFIG_MANDATORY_FILE_LOCKING
+static bool warned_mand;
 static inline bool may_mandlock(void)
 {
+   if (!warned_mand) {
+   warned_mand = true;
+   
pr_warn("==\n");
+   pr_warn("WARNING: the mand mount option is being deprecated 
and\n");
+   pr_warn(" will be removed in v5.15!\n");
+   
pr_warn("==\n");
+   }
return capable(CAP_SYS_ADMIN);
 }
 #else
-- 
2.31.1



[Cluster-devel] [PATCH v2 0/2] fs: remove support for mandatory locking

2021-08-20 Thread Jeff Layton
The first patch in this series adds a new warning that should pop on
kernels have mandatory locking enabled when someone mounts a filesystem
with -o mand. The second patch removes support for mandatory locking
altogether.

What I think we probably want to do is apply the first to v5.14 before
it ships and allow the new warning to trickle out into stable kernels.
Then we can merge the second patch in v5.15 to go ahead and remove it.

Sound like a plan?

Jeff Layton (2):
  fs: warn about impending deprecation of mandatory locks
  fs: remove mandatory file locking support

 .../filesystems/mandatory-locking.rst | 188 --
 fs/9p/vfs_file.c  |  12 --
 fs/Kconfig|  10 -
 fs/afs/flock.c|   4 -
 fs/ceph/locks.c   |   3 -
 fs/gfs2/file.c|   3 -
 fs/locks.c| 116 +--
 fs/namei.c|   4 +-
 fs/namespace.c|  31 +--
 fs/nfs/file.c |   4 -
 fs/nfsd/nfs4state.c   |  13 --
 fs/nfsd/vfs.c |  15 --
 fs/ocfs2/locks.c  |   4 -
 fs/open.c |   8 +-
 fs/read_write.c   |   7 -
 fs/remap_range.c  |  10 -
 include/linux/fs.h|  84 
 mm/mmap.c |   6 -
 mm/nommu.c|   3 -
 19 files changed, 20 insertions(+), 505 deletions(-)
 delete mode 100644 Documentation/filesystems/mandatory-locking.rst

-- 
2.31.1



Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Steven Whitehouse
Hi,

On Fri, 2021-08-20 at 15:17 +0200, Andreas Gruenbacher wrote:
> On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse <
> swhit...@redhat.com> wrote:
> > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
> > > From: Bob Peterson 
> > > 
> > > This patch introduces a new HIF_MAY_DEMOTE flag and
> > > infrastructure
> > > that will allow glocks to be demoted automatically on locking
> > > conflicts.
> > > When a locking request comes in that isn't compatible with the
> > > locking
> > > state of a holder and that holder has the HIF_MAY_DEMOTE flag
> > > set, the
> > > holder will be demoted automatically before the incoming locking
> > > request
> > > is granted.
> > 
> > I'm not sure I understand what is going on here. When there are
> > locking
> > conflicts we generate call backs and those result in glock
> > demotion.
> > There is no need for a flag to indicate that I think, since it is
> > the
> > default behaviour anyway. Or perhaps the explanation is just a bit
> > confusing...
> 
> When a glock has active holders (with the HIF_HOLDER flag set), the
> glock won't be demoted to a state incompatible with any of those
> holders.
> 
Ok, that is a much clearer explanation of what the patch does. Active
holders have always prevented demotions previously.

> > > Processes that allow a glock holder to be taken away indicate
> > > this by
> > > calling gfs2_holder_allow_demote().  When they need the glock
> > > again,
> > > they call gfs2_holder_disallow_demote() and then they check if
> > > the
> > > holder is still queued: if it is, they're still holding the
> > > glock; if
> > > it isn't, they need to re-acquire the glock.
> > > 
> > > This allows processes to hang on to locks that could become part
> > > of a
> > > cyclic locking dependency.  The locks will be given up when a
> > > (rare)
> > > conflicting locking request occurs, and don't need to be given up
> > > prematurely.
> > 
> > This seems backwards to me. We already have the glock layer cache
> > the
> > locks until they are required by another node. We also have the min
> > hold time to make sure that we don't bounce locks too much. So what
> > is
> > the problem that you are trying to solve here I wonder?
> 
> This solves the problem of faulting in pages during read and write
> operations: on the one hand, we want to hold the inode glock across
> those operations. On the other hand, those operations may fault in
> pages, which may require taking the same or other inode glocks,
> directly or indirectly, which can deadlock.
> 
> So before we fault in pages, we indicate with
> gfs2_holder_allow_demote(gh) that we can cope if the glock is taken
> away from us. After faulting in the pages, we indicate with
> gfs2_holder_disallow_demote(gh) that we now actually need the glock
> again. At that point, we either still have the glock (i.e., the
> holder
> is still queued and it has the HIF_HOLDER flag set), or we don't.
> 
> The different kinds of read and write operations differ in how they
> handle the latter case:
> 
>  * When a buffered read or write loses the inode glock, it returns a
> short result. This
>prevents torn writes and reading things that have never existed on
> disk in that form.
> 
>  * When a direct read or write loses the inode glock, it re-acquires
> it before resuming
>the operation. Direct I/O is not expected to return partial
> results
> and doesn't provide
>any kind of synchronization among processes.
> 
> We could solve this kind of problem in other ways, for example, by
> keeping a glock generation number, dropping the glock before faulting
> in pages, re-acquiring it afterwards, and checking if the generation
> number has changed. This would still be an additional piece of glock
> infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag
> which uses the existing glock holder infrastructure.

This is working towards the "why" but could probably be summarised a
bit more. We always used to manage to avoid holding fs locks when
copying to/from userspace to avoid these complications. If that is no
longer possible then it would be good to document what the new
expectations are somewhere suitable in Documentation/filesystems/...
just so we make sure it is clear what the new system is, and everyone
will be on the same page,

Steve.





Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Steven Whitehouse
Hi,

On Fri, 2021-08-20 at 08:11 -0500, Bob Peterson wrote:
> On 8/20/21 4:35 AM, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
> > > From: Bob Peterson 
> > > 
> > > This patch introduces a new HIF_MAY_DEMOTE flag and
> > > infrastructure
> > > that
> > > will allow glocks to be demoted automatically on locking
> > > conflicts.
> > > When a locking request comes in that isn't compatible with the
> > > locking
> > > state of a holder and that holder has the HIF_MAY_DEMOTE flag
> > > set,
> > > the
> > > holder will be demoted automatically before the incoming locking
> > > request
> > > is granted.
> > > 
> > I'm not sure I understand what is going on here. When there are
> > locking
> > conflicts we generate call backs and those result in glock
> > demotion.
> > There is no need for a flag to indicate that I think, since it is
> > the
> > default behaviour anyway. Or perhaps the explanation is just a bit
> > confusing...
> 
> I agree that the whole concept and explanation are confusing.
> Andreas 
> and I went through several heated arguments about the symantics, 
> comments, patch descriptions, etc. We played around with many
> different 
> flag name ideas, etc. We did not agree on the best way to describe
> the 
> whole concept. He didn't like my explanation and I didn't like his.
> So 
> yes, it is confusing.
> 
That seems to be a good reason to take a step back and look at this a
bit closer. If we are finding this confusing, then someone else looking
at it at a future date, who may not be steeped in GFS2 knowledge is
likely to find it almost impossible.

So at least the description needs some work here I think, to make it
much clearer what the overall aim is. It would be good to start with a
statement of the problem that it is trying to solve which Andreas has
hinted at in his reply just now,

Steve.




Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Andreas Gruenbacher
On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse  wrote:
> On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
> > From: Bob Peterson 
> >
> > This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure
> > that will allow glocks to be demoted automatically on locking conflicts.
> > When a locking request comes in that isn't compatible with the locking
> > state of a holder and that holder has the HIF_MAY_DEMOTE flag set, the
> > holder will be demoted automatically before the incoming locking request
> > is granted.
>
> I'm not sure I understand what is going on here. When there are locking
> conflicts we generate call backs and those result in glock demotion.
> There is no need for a flag to indicate that I think, since it is the
> default behaviour anyway. Or perhaps the explanation is just a bit
> confusing...

When a glock has active holders (with the HIF_HOLDER flag set), the
glock won't be demoted to a state incompatible with any of those
holders.

> > Processes that allow a glock holder to be taken away indicate this by
> > calling gfs2_holder_allow_demote().  When they need the glock again,
> > they call gfs2_holder_disallow_demote() and then they check if the
> > holder is still queued: if it is, they're still holding the glock; if
> > it isn't, they need to re-acquire the glock.
> >
> > This allows processes to hang on to locks that could become part of a
> > cyclic locking dependency.  The locks will be given up when a (rare)
> > conflicting locking request occurs, and don't need to be given up
> > prematurely.
>
> This seems backwards to me. We already have the glock layer cache the
> locks until they are required by another node. We also have the min
> hold time to make sure that we don't bounce locks too much. So what is
> the problem that you are trying to solve here I wonder?

This solves the problem of faulting in pages during read and write
operations: on the one hand, we want to hold the inode glock across
those operations. On the other hand, those operations may fault in
pages, which may require taking the same or other inode glocks,
directly or indirectly, which can deadlock.

So before we fault in pages, we indicate with
gfs2_holder_allow_demote(gh) that we can cope if the glock is taken
away from us. After faulting in the pages, we indicate with
gfs2_holder_disallow_demote(gh) that we now actually need the glock
again. At that point, we either still have the glock (i.e., the holder
is still queued and it has the HIF_HOLDER flag set), or we don't.

The different kinds of read and write operations differ in how they
handle the latter case:

 * When a buffered read or write loses the inode glock, it returns a
short result. This
   prevents torn writes and reading things that have never existed on
disk in that form.

 * When a direct read or write loses the inode glock, it re-acquires
it before resuming
   the operation. Direct I/O is not expected to return partial results
and doesn't provide
   any kind of synchronization among processes.

We could solve this kind of problem in other ways, for example, by
keeping a glock generation number, dropping the glock before faulting
in pages, re-acquiring it afterwards, and checking if the generation
number has changed. This would still be an additional piece of glock
infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag
which uses the existing glock holder infrastructure.

> > Signed-off-by: Bob Peterson 
> > ---
> >  fs/gfs2/glock.c  | 221 +++
> > 
> >  fs/gfs2/glock.h  |  20 +
> >  fs/gfs2/incore.h |   1 +
> >  3 files changed, 206 insertions(+), 36 deletions(-)
> >
> > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > index f24db2ececfb..d1b06a09ce2f 100644
> > --- a/fs/gfs2/glock.c
> > +++ b/fs/gfs2/glock.c
> > @@ -58,6 +58,7 @@ struct gfs2_glock_iter {
> >  typedef void (*glock_examiner) (struct gfs2_glock * gl);
> >
> >  static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh,
> > unsigned int target);
> > +static void __gfs2_glock_dq(struct gfs2_holder *gh);
> >
> >  static struct dentry *gfs2_root;
> >  static struct workqueue_struct *glock_workqueue;
> > @@ -197,6 +198,12 @@ static int demote_ok(const struct gfs2_glock
> > *gl)
> >
> >   if (gl->gl_state == LM_ST_UNLOCKED)
> >   return 0;
> > + /*
> > +  * Note that demote_ok is used for the lru process of disposing
> > of
> > +  * glocks. For this purpose, we don't care if the glock's
> > holders
> > +  * have the HIF_MAY_DEMOTE flag set or not. If someone is using
> > +  * them, don't demote.
> > +  */
> >   if (!list_empty(>gl_holders))
> >   return 0;
> >   if (glops->go_demote_ok)
> > @@ -379,7 +386,7 @@ static void do_error(struct gfs2_glock *gl, const
> > int ret)
> >   struct gfs2_holder *gh, *tmp;
> >
> >   list_for_each_entry_safe(gh, tmp, >gl_holders, gh_list) {
> > - if (test_bit(HIF_HOLDER, >gh_iflags))

Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Bob Peterson

On 8/20/21 4:35 AM, Steven Whitehouse wrote:

Hi,

On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:

From: Bob Peterson 

This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure
that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the
locking
state of a holder and that holder has the HIF_MAY_DEMOTE flag set,
the
holder will be demoted automatically before the incoming locking
request
is granted.


I'm not sure I understand what is going on here. When there are locking
conflicts we generate call backs and those result in glock demotion.
There is no need for a flag to indicate that I think, since it is the
default behaviour anyway. Or perhaps the explanation is just a bit
confusing...


I agree that the whole concept and explanation are confusing. Andreas 
and I went through several heated arguments about the symantics, 
comments, patch descriptions, etc. We played around with many different 
flag name ideas, etc. We did not agree on the best way to describe the 
whole concept. He didn't like my explanation and I didn't like his. So 
yes, it is confusing.


My preferred terminology was "DOD" or "Dequeue On Demand" which makes 
the concept more understandable to me. So basically a process can say
"I need to hold this glock, but for an unknown and possibly lengthy 
period of time, but please feel free to dequeue it if it's in your way."

And bear in mind that several processes may do the same, simultaneously.

You can almost think of this as a performance enhancement. This concept 
allows a process to hold a glock for much longer periods of time, at a 
lower priority, for example, when gfs2_file_read_iter needs to hold the 
glock for very long-running iterative reads.


The process requesting a holder with "Demote On Demand" must then 
determine if its holder has been stolen away (dequeued on demand) after 
its lengthy operation, and therefore needs to pick up the pieces of 
where it left off in its process.


Meanwhile, another process may need to hold the glock. If its requested 
mode is compatible, say SH and SH, the lock is simply granted with no 
further delay. If the mode is incompatible, regardless of whether it's 
on the local node or a different node in the cluster, these 
longer-term/lower-priority holders may be dequeued or prempted by 
another request to hold the glock. Note that although these holders are 
dequeued-on-demand, they are never "uninitted" as part of the process. 
Nor must they ever be, since they may be on another process's heap.


This differs from the normal glock demote process in which the demote 
bit is set on ("requesting" the glock be demoted) but still needs to 
block until the holder does its actual dequeue.



Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote().  When they need the glock again,
they call gfs2_holder_disallow_demote() and then they check if the
holder is still queued: if it is, they're still holding the glock; if
it
isn't, they need to re-acquire the glock.

This allows processes to hang on to locks that could become part of a
cyclic locking dependency.  The locks will be given up when a (rare)
conflicting locking request occurs, and don't need to be given up
prematurely.

This seems backwards to me. We already have the glock layer cache the
locks until they are required by another node. We also have the min
hold time to make sure that we don't bounce locks too much. So what is
the problem that you are trying to solve here I wonder?


Again, this is simply allowing premption of lenghy/low-priority holders 
whereas the normal demote process will only demote when the glock is 
dequeued after this potentially very-long period of time.


The minimum hold time solves a different problem, and Andreas and I 
talked just yesterday about possibly revisiting how that all works. The 
problem with minimum hold time is that in many cases the glock state 
machine does not want to grant new holders if the demote bit is on, so 
it ends up wasting more time than solving the actual problem.

But that's another problem for another day.

Regards,

Bob Peterson



[Cluster-devel] [PATCH] fs: remove mandatory file locking support

2021-08-20 Thread Jeff Layton
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton 
---
 .../filesystems/mandatory-locking.rst | 188 --
 fs/9p/vfs_file.c  |  12 --
 fs/Kconfig|  10 -
 fs/afs/flock.c|   4 -
 fs/ceph/locks.c   |   3 -
 fs/gfs2/file.c|   3 -
 fs/locks.c| 116 +--
 fs/namei.c|   4 +-
 fs/namespace.c|  31 +--
 fs/nfs/file.c |   4 -
 fs/nfsd/nfs4state.c   |  13 --
 fs/nfsd/vfs.c |  15 --
 fs/ocfs2/locks.c  |   4 -
 fs/open.c |   8 +-
 fs/read_write.c   |   7 -
 fs/remap_range.c  |  10 -
 include/linux/fs.h|  84 
 mm/mmap.c |   6 -
 mm/nommu.c|   3 -
 19 files changed, 20 insertions(+), 505 deletions(-)
 delete mode 100644 Documentation/filesystems/mandatory-locking.rst

Here's a first stab at a wholesale removal patch. The main questions at
this point are whether ignorning "mand" from here on out is the right
thing to do, and how we should pass that info along to the user.

diff --git a/Documentation/filesystems/mandatory-locking.rst 
b/Documentation/filesystems/mandatory-locking.rst
deleted file mode 100644
index 9ce73544a8f0..
--- a/Documentation/filesystems/mandatory-locking.rst
+++ /dev/null
@@ -1,188 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=
-Mandatory File Locking For The Linux Operating System
-=
-
-   Andy Walker 
-
-  15 April 1996
-
-(Updated September 2007)
-
-0. Why you should avoid mandatory locking
--
-
-The Linux implementation is prey to a number of difficult-to-fix race
-conditions which in practice make it not dependable:
-
-   - The write system call checks for a mandatory lock only once
- at its start.  It is therefore possible for a lock request to
- be granted after this check but before the data is modified.
- A process may then see file data change even while a mandatory
- lock was held.
-   - Similarly, an exclusive lock may be granted on a file after
- the kernel has decided to proceed with a read, but before the
- read has actually completed, and the reading process may see
- the file data in a state which should not have been visible
- to it.
-   - Similar races make the claimed mutual exclusion between lock
- and mmap similarly unreliable.
-
-1. What is  mandatory locking?
---
-
-Mandatory locking is kernel enforced file locking, as opposed to the more usual
-cooperative file locking used to guarantee sequential access to files among
-processes. File locks are applied using the flock() and fcntl() system calls
-(and the lockf() library routine which is a wrapper around fcntl().) It is
-normally a process' responsibility to check for locks on a file it wishes to
-update, before applying its own lock, updating the file and unlocking it again.
-The most commonly used example of this (and in the case of sendmail, the most
-troublesome) is access to a user's mailbox. The mail user agent and the mail
-transfer agent must guard against updating the mailbox at the same time, and
-prevent reading the mailbox while it is being updated.
-
-In a perfect world all processes would use and honour a cooperative, or
-"advisory" locking scheme. However, the world isn't perfect, and there's
-a lot of poorly written code out there.
-
-In trying to address this problem, the designers of System V UNIX came up
-with a "mandatory" locking scheme, whereby the operating system kernel would
-block attempts by a process to write to a file that another process holds a
-"read" -or- "shared" lock on, and block attempts to both read and write to a 

Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion

2021-08-20 Thread Steven Whitehouse
Hi,

On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote:
> From: Bob Peterson 
> 
> This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure
> that
> will allow glocks to be demoted automatically on locking conflicts.
> When a locking request comes in that isn't compatible with the
> locking
> state of a holder and that holder has the HIF_MAY_DEMOTE flag set,
> the
> holder will be demoted automatically before the incoming locking
> request
> is granted.
> 
I'm not sure I understand what is going on here. When there are locking
conflicts we generate call backs and those result in glock demotion.
There is no need for a flag to indicate that I think, since it is the
default behaviour anyway. Or perhaps the explanation is just a bit
confusing...

> Processes that allow a glock holder to be taken away indicate this by
> calling gfs2_holder_allow_demote().  When they need the glock again,
> they call gfs2_holder_disallow_demote() and then they check if the
> holder is still queued: if it is, they're still holding the glock; if
> it
> isn't, they need to re-acquire the glock.
> 
> This allows processes to hang on to locks that could become part of a
> cyclic locking dependency.  The locks will be given up when a (rare)
> conflicting locking request occurs, and don't need to be given up
> prematurely.
This seems backwards to me. We already have the glock layer cache the
locks until they are required by another node. We also have the min
hold time to make sure that we don't bounce locks too much. So what is
the problem that you are trying to solve here I wonder?

> 
> Signed-off-by: Bob Peterson 
> ---
>  fs/gfs2/glock.c  | 221 +++
> 
>  fs/gfs2/glock.h  |  20 +
>  fs/gfs2/incore.h |   1 +
>  3 files changed, 206 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index f24db2ececfb..d1b06a09ce2f 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -58,6 +58,7 @@ struct gfs2_glock_iter {
>  typedef void (*glock_examiner) (struct gfs2_glock * gl);
>  
>  static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh,
> unsigned int target);
> +static void __gfs2_glock_dq(struct gfs2_holder *gh);
>  
>  static struct dentry *gfs2_root;
>  static struct workqueue_struct *glock_workqueue;
> @@ -197,6 +198,12 @@ static int demote_ok(const struct gfs2_glock
> *gl)
>  
>   if (gl->gl_state == LM_ST_UNLOCKED)
>   return 0;
> + /*
> +  * Note that demote_ok is used for the lru process of disposing
> of
> +  * glocks. For this purpose, we don't care if the glock's
> holders
> +  * have the HIF_MAY_DEMOTE flag set or not. If someone is using
> +  * them, don't demote.
> +  */
>   if (!list_empty(>gl_holders))
>   return 0;
>   if (glops->go_demote_ok)
> @@ -379,7 +386,7 @@ static void do_error(struct gfs2_glock *gl, const
> int ret)
>   struct gfs2_holder *gh, *tmp;
>  
>   list_for_each_entry_safe(gh, tmp, >gl_holders, gh_list) {
> - if (test_bit(HIF_HOLDER, >gh_iflags))
> + if (!test_bit(HIF_WAIT, >gh_iflags))
>   continue;
>   if (ret & LM_OUT_ERROR)
>   gh->gh_error = -EIO;
> @@ -393,6 +400,40 @@ static void do_error(struct gfs2_glock *gl,
> const int ret)
>   }
>  }
>  
> +/**
> + * demote_incompat_holders - demote incompatible demoteable holders
> + * @gl: the glock we want to promote
> + * @new_gh: the new holder to be promoted
> + */
> +static void demote_incompat_holders(struct gfs2_glock *gl,
> + struct gfs2_holder *new_gh)
> +{
> + struct gfs2_holder *gh;
> +
> + /*
> +  * Demote incompatible holders before we make ourselves
> eligible.
> +  * (This holder may or may not allow auto-demoting, but we
> don't want
> +  * to demote the new holder before it's even granted.)
> +  */
> + list_for_each_entry(gh, >gl_holders, gh_list) {
> + /*
> +  * Since holders are at the front of the list, we stop
> when we
> +  * find the first non-holder.
> +  */
> + if (!test_bit(HIF_HOLDER, >gh_iflags))
> + return;
> + if (test_bit(HIF_MAY_DEMOTE, >gh_iflags) &&
> + !may_grant(gl, new_gh, gh)) {
> + /*
> +  * We should not recurse into do_promote
> because
> +  * __gfs2_glock_dq only calls handle_callback,
> +  * gfs2_glock_add_to_lru and
> __gfs2_glock_queue_work.
> +  */
> + __gfs2_glock_dq(gh);
> + }
> + }
> +}
> +
>  /**
>   * find_first_holder - find the first "holder" gh
>   * @gl: the glock
> @@ -411,6 +452,26 @@ static inline struct gfs2_holder
> *find_first_holder(const struct gfs2_glock *gl)
>   return NULL;
>  }
>  
> +/**
> + * find_first_strong_holder - find 

Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock

2021-08-20 Thread Jia-Ju Bai




On 2021/8/19 23:55, David Teigland wrote:

On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote:

Hello,

My static analysis tool reports a possible ABBA deadlock in the dlm
filesystem in Linux 5.10:

dlm_recover_waiters_pre()
   mutex_lock(>ls_waiters_mutex); --> line 5130
   recover_convert_waiter()
     _receive_convert_reply()
   lock_rsb()
     mutex_lock(>res_mutex); --> line 69

dlm_recover_waiters_post()
   lock_rsb()
     mutex_lock(>res_mutex); --> line 69
   mutex_lock(>ls_waiters_mutex); --> line 5307

When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are
concurrently executed, the deadlock can occur.

I am not quite sure whether this possible deadlock is real and how to fix it
if it is real.
Any feedback would be appreciated, thanks :)

They won't be concurrent, "pre" runs before recovery, and "post" is after.


Okay, thanks for your reply :)


Best wishes,
Jia-Ju Bai