[Cluster-devel] [PATCH dlm/next 0/3] fs: dlm: debug and tracepoints
Hi, again a resend of the tracepoints patches and fixed that the lkb reference is hold during tracepoint call. We need to do that because we access the structure during tracing. There is some code duplication to convert the return error for dlm_lock_end trace now. I didn't wanted to change the existing code and introduce more overhead if tracing is not enabled. - Alex Alexander Aring (3): fs: dlm: debug improvements print nodeid fs: dlm: initial support for tracepoints fs: dlm: trace socket handling fs/dlm/ast.c | 4 + fs/dlm/lock.c | 10 ++ fs/dlm/lowcomms.c | 4 + fs/dlm/main.c | 3 + fs/dlm/midcomms.c | 4 +- include/trace/events/dlm.h | 260 + 6 files changed, 283 insertions(+), 2 deletions(-) create mode 100644 include/trace/events/dlm.h -- 2.27.0
[Cluster-devel] [PATCH dlm/next 2/3] fs: dlm: initial support for tracepoints
This patch adds initial support for dlm tracepoints. It will introduce tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their complete ast() callback or blocking bast() callback. The lock/unlock functionality has a start and end tracepoint, this is because there exists a race in case if would have a tracepoint at the end position only the complete/blocking callbacks could occur before. To work with eBPF tracing and using their lookup hash functionality there could be problems that an entry was not inserted yet. However use the start functionality for hash insert and check again in end functionality if there was an dlm internal error so there is no ast callback. In further it might also that locks with local masters will occur those callbacks immediately so we must have such functionality. I did not make everything accessible yet, although it seems eBPF can be used to access a lot of internal datastructures if it's aware of the struct definitions of the running kernel instance. We still can change it, if you do eBPF experiments e.g. time measurements between lock and callback functionality you can simple use the local lkb_id field as hash value in combination with the lockspace id if you have multiple lockspaces. Otherwise you can simple use trace-cmd for some functionality, e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards. Signed-off-by: Alexander Aring --- fs/dlm/ast.c | 4 + fs/dlm/lock.c | 10 ++ fs/dlm/main.c | 3 + include/trace/events/dlm.h | 220 + 4 files changed, 237 insertions(+) create mode 100644 include/trace/events/dlm.h diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c index 283c7b94edda..0e7c6f6819ac 100644 --- a/fs/dlm/ast.c +++ b/fs/dlm/ast.c @@ -9,6 +9,8 @@ *** **/ +#include + #include "dlm_internal.h" #include "lock.h" #include "user.h" @@ -254,10 +256,12 @@ void dlm_callback_work(struct work_struct *work) continue; } else if (callbacks[i].flags & DLM_CB_BAST) { bastfn(lkb->lkb_astparam, callbacks[i].mode); + trace_dlm_bast(ls, lkb, callbacks[i].mode); } else if (callbacks[i].flags & DLM_CB_CAST) { lkb->lkb_lksb->sb_status = callbacks[i].sb_status; lkb->lkb_lksb->sb_flags = callbacks[i].sb_flags; castfn(lkb->lkb_astparam); + trace_dlm_ast(ls, lkb, lkb->lkb_lksb); } } diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index c502c065d007..feb2e94f5879 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -53,6 +53,8 @@ R: do_() L: receive__reply() <- R: send__reply() */ +#include + #include #include #include @@ -3437,6 +3439,8 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (error) goto out; + trace_dlm_lock_start(ls, lkb, mode, flags); + error = set_lock_args(mode, lksb, flags, namelen, 0, ast, astarg, bast, ); if (error) @@ -3450,6 +3454,8 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (error == -EINPROGRESS) error = 0; out_put: + trace_dlm_lock_end(ls, lkb, mode, flags, error); + if (convert || error) __put_lkb(ls, lkb); if (error == -EAGAIN || error == -EDEADLK) @@ -3481,6 +3487,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace, if (error) goto out; + trace_dlm_unlock_start(ls, lkb, flags); + error = set_unlock_args(flags, astarg, ); if (error) goto out_put; @@ -3495,6 +3503,8 @@ int dlm_unlock(dlm_lockspace_t *lockspace, if (error == -EBUSY && (flags & (DLM_LKF_CANCEL | DLM_LKF_FORCEUNLOCK))) error = 0; out_put: + trace_dlm_unlock_end(ls, lkb, flags, error); + dlm_put_lkb(lkb); out: dlm_unlock_recovery(ls); diff --git a/fs/dlm/main.c b/fs/dlm/main.c index afc66a1346d3..1c5be4b70ac1 100644 --- a/fs/dlm/main.c +++ b/fs/dlm/main.c @@ -19,6 +19,9 @@ #include "config.h" #include "lowcomms.h" +#define CREATE_TRACE_POINTS +#include + static int __init init_dlm(void) { int error; diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h new file mode 100644 index ..cc2dd79f4ec7 --- /dev/null +++ b/include/trace/events/dlm.h @@ -0,0 +1,220 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dlm + +#if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DLM_H + +#include +#include +#include + +#include "../../../fs/dlm/dlm_internal.h" + +#define show_lock_flags(flags) __print_flags(flags, "|", \ + {
[Cluster-devel] [PATCH dlm/next 3/3] fs: dlm: trace socket handling
This patch adds tracepoints for dlm socket receive and send functionality. We can use it to track how much data was send or received to or from a specific nodeid. Signed-off-by: Alexander Aring --- fs/dlm/lowcomms.c | 4 include/trace/events/dlm.h | 40 ++ 2 files changed, 44 insertions(+) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 8f715c620e1f..e5438f2e586a 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -53,6 +53,8 @@ #include #include +#include + #include "dlm_internal.h" #include "lowcomms.h" #include "midcomms.h" @@ -925,6 +927,7 @@ static int receive_from_sock(struct connection *con) msg.msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL; ret = kernel_recvmsg(con->sock, , , 1, iov.iov_len, msg.msg_flags); + trace_dlm_recv(con->nodeid, ret); if (ret == -EAGAIN) break; else if (ret <= 0) @@ -1411,6 +1414,7 @@ static void send_to_sock(struct connection *con) ret = kernel_sendpage(con->sock, e->page, offset, len, msg_flags); + trace_dlm_send(con->nodeid, ret); if (ret == -EAGAIN || ret == 0) { if (ret == -EAGAIN && test_bit(SOCKWQ_ASYNC_NOSPACE, >sock->flags) && diff --git a/include/trace/events/dlm.h b/include/trace/events/dlm.h index cc2dd79f4ec7..9a2e3d869363 100644 --- a/include/trace/events/dlm.h +++ b/include/trace/events/dlm.h @@ -214,6 +214,46 @@ TRACE_EVENT(dlm_unlock_end, ); +TRACE_EVENT(dlm_send, + + TP_PROTO(int nodeid, int ret), + + TP_ARGS(nodeid, ret), + + TP_STRUCT__entry( + __field(int, nodeid) + __field(int, ret) + ), + + TP_fast_assign( + __entry->nodeid = nodeid; + __entry->ret = ret; + ), + + TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret) + +); + +TRACE_EVENT(dlm_recv, + + TP_PROTO(int nodeid, int ret), + + TP_ARGS(nodeid, ret), + + TP_STRUCT__entry( + __field(int, nodeid) + __field(int, ret) + ), + + TP_fast_assign( + __entry->nodeid = nodeid; + __entry->ret = ret; + ), + + TP_printk("nodeid=%d ret=%d", __entry->nodeid, __entry->ret) + +); + #endif /* if !defined(_TRACE_DLM_H) || defined(TRACE_HEADER_MULTI_READ) */ /* This part must be outside protection */ -- 2.27.0
[Cluster-devel] [PATCH dlm/next 1/3] fs: dlm: debug improvements print nodeid
This patch improves the debug output for midcomms layer by also printing out the nodeid where users counter belongs to. Signed-off-by: Alexander Aring --- fs/dlm/midcomms.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c index 7ae39ec8d9b0..008078f06813 100644 --- a/fs/dlm/midcomms.c +++ b/fs/dlm/midcomms.c @@ -1231,7 +1231,7 @@ void dlm_midcomms_add_member(int nodeid) } node->users++; - pr_debug("users inc count %d\n", node->users); + pr_debug("node %d users inc count %d\n", nodeid, node->users); spin_unlock(>state_lock); srcu_read_unlock(_srcu, idx); @@ -1254,7 +1254,7 @@ void dlm_midcomms_remove_member(int nodeid) spin_lock(>state_lock); node->users--; - pr_debug("users dec count %d\n", node->users); + pr_debug("node %d users dec count %d\n", nodeid, node->users); /* hitting users count to zero means the * other side is running dlm_midcomms_stop() -- 2.27.0
Re: [Cluster-devel] [PATCH v3 2/2] fs: remove mandatory file locking support
On Fri, Aug 20, 2021 at 12:39:19PM -0400, Jeff Layton wrote: > diff --git a/fs/locks.c b/fs/locks.c > @@ -2857,8 +2744,7 @@ static void lock_get_status(struct seq_file *f, struct > file_lock *fl, > seq_puts(f, "POSIX "); > > seq_printf(f, " %s ", > - (inode == NULL) ? "*NOINODE*" : > - mandatory_lock(inode) ? "MANDATORY" : "ADVISORY "); > + (inode == NULL) ? "*NOINODE*" : "ADVISORY "); Huh? if (fl->fl_file != NULL) inode = locks_inode(fl->fl_file); So basically that's fl->fl_file ? "ADVISORY" : "*NOINODE*"? How could that trigger, though? With locks_mandatory_area() gone, I don't see how FL_POSIX file_lock with NULL ->fl_file could be ever found... Confused... Why is locks_copy_conflock() exported (hell, non-static), BTW? > diff --git a/fs/namespace.c b/fs/namespace.c > -#ifdef CONFIG_MANDATORY_FILE_LOCKING > -static bool may_mandlock(void) > +static void warn_mandlock(void) > { > - pr_warn_once("==\n" > - "WARNING: the mand mount option is being deprecated and\n" > - " will be removed in v5.15!\n" > - > "==\n"); > - return capable(CAP_SYS_ADMIN); > + pr_warn_once("===\n" > + "WARNING: The mand mount option has been deprecated and\n" > + " and is ignored by this kernel. Remove the mand\n" > + " option from the mount to silence this warning.\n" > + > "===\n"); > } > -#else > -static inline bool may_mandlock(void) > -{ > - pr_warn("VFS: \"mand\" mount option not supported"); > - return false; > -} > -#endif Is there any point in having the previous patch not folded into this one?
Re: [Cluster-devel] [PATCH v3 0/2] fs: remove support for mandatory locking
On Fri, 2021-08-20 at 12:39 -0400, Jeff Layton wrote: > v3: slight revision to verbiage, and use pr_warn_once > > The first patch in this series adds a new warning that should pop on > kernels that have mandatory locking enabled when someone mounts a > filesystem with -o mand. The second patch removes support for mandatory > locking altogether. > > What I think we probably want to do is apply the first to v5.14 before > it ships and allow the new warning to trickle out into stable kernels. > Then we can merge the second patch in v5.15 to go ahead and remove it. > > Sound like a plan? > > Jeff Layton (2): > fs: warn about impending deprecation of mandatory locks > fs: remove mandatory file locking support > > .../filesystems/mandatory-locking.rst | 188 -- > fs/9p/vfs_file.c | 12 -- > fs/Kconfig| 10 - > fs/afs/flock.c| 4 - > fs/ceph/locks.c | 3 - > fs/gfs2/file.c| 3 - > fs/locks.c| 116 +-- > fs/namei.c| 4 +- > fs/namespace.c| 25 +-- > fs/nfs/file.c | 4 - > fs/nfsd/nfs4state.c | 13 -- > fs/nfsd/vfs.c | 15 -- > fs/ocfs2/locks.c | 4 - > fs/open.c | 8 +- > fs/read_write.c | 7 - > fs/remap_range.c | 10 - > include/linux/fs.h| 84 > mm/mmap.c | 6 - > mm/nommu.c| 3 - > 19 files changed, 14 insertions(+), 505 deletions(-) > delete mode 100644 Documentation/filesystems/mandatory-locking.rst > I went ahead and pushed this version into the locks-next branch, so we can give it some soak time before merging. -- Jeff Layton
[Cluster-devel] [PATCH v3 2/2] fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it off in Fedora and RHEL8. Several other distros have followed suit. I've heard of one problem in all that time: Someone migrated from an older distro that supported "-o mand" to one that didn't, and the host had a fstab entry with "mand" in it which broke on reboot. They didn't actually _use_ mandatory locking so they just removed the mount option and moved on. This patch rips out mandatory locking support wholesale from the kernel, along with the Kconfig option and the Documentation file. It also changes the mount code to ignore the "mand" mount option instead of erroring out, and to throw a big, ugly warning. Signed-off-by: Jeff Layton --- .../filesystems/mandatory-locking.rst | 188 -- fs/9p/vfs_file.c | 12 -- fs/Kconfig| 10 - fs/afs/flock.c| 4 - fs/ceph/locks.c | 3 - fs/gfs2/file.c| 3 - fs/locks.c| 116 +-- fs/namei.c| 4 +- fs/namespace.c| 29 +-- fs/nfs/file.c | 4 - fs/nfsd/nfs4state.c | 13 -- fs/nfsd/vfs.c | 15 -- fs/ocfs2/locks.c | 4 - fs/open.c | 8 +- fs/read_write.c | 7 - fs/remap_range.c | 10 - include/linux/fs.h| 84 mm/mmap.c | 6 - mm/nommu.c| 3 - 19 files changed, 14 insertions(+), 509 deletions(-) delete mode 100644 Documentation/filesystems/mandatory-locking.rst diff --git a/Documentation/filesystems/mandatory-locking.rst b/Documentation/filesystems/mandatory-locking.rst deleted file mode 100644 index 9ce73544a8f0.. --- a/Documentation/filesystems/mandatory-locking.rst +++ /dev/null @@ -1,188 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -= -Mandatory File Locking For The Linux Operating System -= - - Andy Walker - - 15 April 1996 - -(Updated September 2007) - -0. Why you should avoid mandatory locking -- - -The Linux implementation is prey to a number of difficult-to-fix race -conditions which in practice make it not dependable: - - - The write system call checks for a mandatory lock only once - at its start. It is therefore possible for a lock request to - be granted after this check but before the data is modified. - A process may then see file data change even while a mandatory - lock was held. - - Similarly, an exclusive lock may be granted on a file after - the kernel has decided to proceed with a read, but before the - read has actually completed, and the reading process may see - the file data in a state which should not have been visible - to it. - - Similar races make the claimed mutual exclusion between lock - and mmap similarly unreliable. - -1. What is mandatory locking? --- - -Mandatory locking is kernel enforced file locking, as opposed to the more usual -cooperative file locking used to guarantee sequential access to files among -processes. File locks are applied using the flock() and fcntl() system calls -(and the lockf() library routine which is a wrapper around fcntl().) It is -normally a process' responsibility to check for locks on a file it wishes to -update, before applying its own lock, updating the file and unlocking it again. -The most commonly used example of this (and in the case of sendmail, the most -troublesome) is access to a user's mailbox. The mail user agent and the mail -transfer agent must guard against updating the mailbox at the same time, and -prevent reading the mailbox while it is being updated. - -In a perfect world all processes would use and honour a cooperative, or -"advisory" locking scheme. However, the world isn't perfect, and there's -a lot of poorly written code out there. - -In trying to address this problem, the designers of System V UNIX came up -with a "mandatory" locking scheme, whereby the operating system kernel would -block attempts by a process to write to a file that another process holds a -"read" -or- "shared" lock on, and block attempts to both read and write to a -file that a process holds a "write " -or- "exclusive" lock on. - -The System V mandatory locking scheme was intended to have as little impact as -possible on existing user code. The scheme is based on marking
[Cluster-devel] [PATCH v3 1/2] fs: warn about impending deprecation of mandatory locks
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros have disabled it. Warn the stragglers that still use "-o mand" that we'll be dropping support for that mount option. Cc: sta...@vger.kernel.org Signed-off-by: Jeff Layton --- fs/namespace.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index ab4174a3c802..2279473d0d6f 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1716,8 +1716,12 @@ static inline bool may_mount(void) } #ifdef CONFIG_MANDATORY_FILE_LOCKING -static inline bool may_mandlock(void) +static bool may_mandlock(void) { + pr_warn_once("==\n" +"WARNING: the mand mount option is being deprecated and\n" +" will be removed in v5.15!\n" + "==\n"); return capable(CAP_SYS_ADMIN); } #else -- 2.31.1
[Cluster-devel] [PATCH v3 0/2] fs: remove support for mandatory locking
v3: slight revision to verbiage, and use pr_warn_once The first patch in this series adds a new warning that should pop on kernels that have mandatory locking enabled when someone mounts a filesystem with -o mand. The second patch removes support for mandatory locking altogether. What I think we probably want to do is apply the first to v5.14 before it ships and allow the new warning to trickle out into stable kernels. Then we can merge the second patch in v5.15 to go ahead and remove it. Sound like a plan? Jeff Layton (2): fs: warn about impending deprecation of mandatory locks fs: remove mandatory file locking support .../filesystems/mandatory-locking.rst | 188 -- fs/9p/vfs_file.c | 12 -- fs/Kconfig| 10 - fs/afs/flock.c| 4 - fs/ceph/locks.c | 3 - fs/gfs2/file.c| 3 - fs/locks.c| 116 +-- fs/namei.c| 4 +- fs/namespace.c| 25 +-- fs/nfs/file.c | 4 - fs/nfsd/nfs4state.c | 13 -- fs/nfsd/vfs.c | 15 -- fs/ocfs2/locks.c | 4 - fs/open.c | 8 +- fs/read_write.c | 7 - fs/remap_range.c | 10 - include/linux/fs.h| 84 mm/mmap.c | 6 - mm/nommu.c| 3 - 19 files changed, 14 insertions(+), 505 deletions(-) delete mode 100644 Documentation/filesystems/mandatory-locking.rst -- 2.31.1
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
On Fri, 20 Aug 2021 17:52:19 +0200 David Hildenbrand wrote: > > +static bool warned_mand; > > static inline bool may_mandlock(void) > > { > > + if (!warned_mand) { > > + warned_mand = true; > > + > > pr_warn("==\n"); > > + pr_warn("WARNING: the mand mount option is being deprecated > > and\n"); > > + pr_warn(" will be removed in v5.15!\n"); > > + > > pr_warn("==\n"); > > + } > > Is there a reason not to use pr_warn_once() ? You would need a single call though, otherwise each pr_warn_once() would have its own state that it warned once. const char warning[] = "==\n" "WARNING: the mand mount option is being deprecated and\n" " will be removed in v5.15!\n" "==\n"; pr_warn_once(warning); -- Steve
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
On Fri, 2021-08-20 at 17:52 +0200, David Hildenbrand wrote: > On 20.08.21 15:57, Jeff Layton wrote: > > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros > > have disabled it. Warn the stragglers that still use "-o mand" that > > we'll be dropping support for that mount option. > > > > Cc: sta...@vger.kernel.org > > Signed-off-by: Jeff Layton > > --- > > fs/namespace.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index ab4174a3c802..ffab0bb1e649 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) > > } > > > > #ifdefCONFIG_MANDATORY_FILE_LOCKING > > +static bool warned_mand; > > static inline bool may_mandlock(void) > > { > > + if (!warned_mand) { > > + warned_mand = true; > > + > > pr_warn("==\n"); > > + pr_warn("WARNING: the mand mount option is being deprecated > > and\n"); > > + pr_warn(" will be removed in v5.15!\n"); > > + > > pr_warn("==\n"); > > + } > > Is there a reason not to use pr_warn_once() ? > > No reason at all. I'll send out a v3 set in a bit with that change. Thanks! -- Jeff Layton
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
From: Jeff Layton > Sent: 20 August 2021 14:57 > > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros > have disabled it. Warn the stragglers that still use "-o mand" that > we'll be dropping support for that mount option. > > Cc: sta...@vger.kernel.org > Signed-off-by: Jeff Layton > --- > fs/namespace.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/fs/namespace.c b/fs/namespace.c > index ab4174a3c802..ffab0bb1e649 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) > } > > #ifdef CONFIG_MANDATORY_FILE_LOCKING > +static bool warned_mand; > static inline bool may_mandlock(void) > { > + if (!warned_mand) { > + warned_mand = true; > + > pr_warn("==\n"); > + pr_warn("WARNING: the mand mount option is being deprecated > and\n"); > + pr_warn(" will be removed in v5.15!\n"); > + > pr_warn("==\n"); > + } > return capable(CAP_SYS_ADMIN); > } If that is called more than once you don't want the 'inline'. I doubt it matters is not inlined - hardly a hot path. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
On 20.08.21 15:57, Jeff Layton wrote: We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros have disabled it. Warn the stragglers that still use "-o mand" that we'll be dropping support for that mount option. Cc: sta...@vger.kernel.org Signed-off-by: Jeff Layton --- fs/namespace.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index ab4174a3c802..ffab0bb1e649 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) } #ifdef CONFIG_MANDATORY_FILE_LOCKING +static bool warned_mand; static inline bool may_mandlock(void) { + if (!warned_mand) { + warned_mand = true; + pr_warn("==\n"); + pr_warn("WARNING: the mand mount option is being deprecated and\n"); + pr_warn(" will be removed in v5.15!\n"); + pr_warn("==\n"); + } Is there a reason not to use pr_warn_once() ? -- Thanks, David / dhildenb
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
On Fri, 2021-08-20 at 15:49 +, David Laight wrote: > From: Jeff Layton > > Sent: 20 August 2021 14:57 > > > > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros > > have disabled it. Warn the stragglers that still use "-o mand" that > > we'll be dropping support for that mount option. > > > > Cc: sta...@vger.kernel.org > > Signed-off-by: Jeff Layton > > --- > > fs/namespace.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index ab4174a3c802..ffab0bb1e649 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) > > } > > > > #ifdef CONFIG_MANDATORY_FILE_LOCKING > > +static bool warned_mand; > > static inline bool may_mandlock(void) > > { > > + if (!warned_mand) { > > + warned_mand = true; > > + > > pr_warn("==\n"); > > + pr_warn("WARNING: the mand mount option is being deprecated > > and\n"); > > + pr_warn(" will be removed in v5.15!\n"); > > + > > pr_warn("==\n"); > > + } > > return capable(CAP_SYS_ADMIN); > > } > > If that is called more than once you don't want the 'inline'. > I doubt it matters is not inlined - hardly a hot path. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK > Registration No: 1397386 (Wales) > ACK. Of course. That really needs to not be inline. I'll fix that up in my tree. Thanks! -- Jeff Layton
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
On Fri, Aug 20, 2021 at 3:11 PM Bob Peterson wrote: > On 8/20/21 4:35 AM, Steven Whitehouse wrote: > > Hi, > > > > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > >> From: Bob Peterson > >> > >> This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure > >> that > >> will allow glocks to be demoted automatically on locking conflicts. > >> When a locking request comes in that isn't compatible with the > >> locking > >> state of a holder and that holder has the HIF_MAY_DEMOTE flag set, > >> the > >> holder will be demoted automatically before the incoming locking > >> request > >> is granted. > >> > > I'm not sure I understand what is going on here. When there are locking > > conflicts we generate call backs and those result in glock demotion. > > There is no need for a flag to indicate that I think, since it is the > > default behaviour anyway. Or perhaps the explanation is just a bit > > confusing... > > I agree that the whole concept and explanation are confusing. Andreas > and I went through several heated arguments about the semantics, > comments, patch descriptions, etc. We played around with many different > flag name ideas, etc. We did not agree on the best way to describe the > whole concept. He didn't like my explanation and I didn't like his. So > yes, it is confusing. > > My preferred terminology was "DOD" or "Dequeue On Demand" ... which is useless because it adds no clarity as to whose demand we're talking about. > which makes > the concept more understandable to me. So basically a process can say > "I need to hold this glock, but for an unknown and possibly lengthy > period of time, but please feel free to dequeue it if it's in your way." > And bear in mind that several processes may do the same, simultaneously. > > You can almost think of this as a performance enhancement. This concept > allows a process to hold a glock for much longer periods of time, at a > lower priority, for example, when gfs2_file_read_iter needs to hold the > glock for very long-running iterative reads. Consider a process that allocates a somewhat large buffer and reads into it in chunks that are not page aligned. The buffer initially won't be faulted in, so we fault in the first chunk and write into it. Then, when reading the second chunk, we find that the first page of the second chunk is already present. We fill it, set the HIF_MAY_DEMOTE flag, fault in more pages, and clear the HIF_MAY_DEMOTE flag. If we then still have the glock (which is very likely), we resume the read. Otherwise, we return a short result. Thanks, Andreas
Re: [Cluster-devel] [PATCH 11/30] iomap: add the new iomap_iter model
On Thu, Aug 19, 2021 at 9:12 PM Christoph Hellwig wrote: > > On Thu, Aug 19, 2021 at 02:25:52PM -0700, Dan Williams wrote: > > Given most of the iomap_iter users don't care about srcmap, i.e. are > > not COW cases, they are leaving srcmap zero initialized. Should the > > IOMAP types be incremented by one so that there is no IOMAP_HOLE > > confusion? In other words, fold something like this? > > A hole really means nothing to read from the source. The existing code > also relies on that. Ok, I've since found iomap_iter_srcmap(). Sorry for the noise.
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
Am Fr., 20. Aug. 2021 um 15:49 Uhr schrieb Steven Whitehouse : > We always used to manage to avoid holding fs locks when copying to/from > userspace > to avoid these complications. I realize the intent, but that goal has never actually been achieved. Direct I/O has *always* been calling get_user_pages() while holding the inode glock in deferred mode. The situation is slightly different for buffered I/O, but we have the same problem there at least since switching to iomap. (And it's a fundamental property of iomap that we want to hold the inode glock across multi-page mappings, not an implementation deficit.) Here's a slightly outdated version of a test case that makes all versions of gfs2 prior to this patch queue unhappy: https://lore.kernel.org/fstests/20210531152604.240462-1-agrue...@redhat.com/ Thanks, Andreas
Re: [Cluster-devel] [PATCH v2 0/2] fs: remove support for mandatory locking
On Fri, Aug 20, 2021 at 09:57:05AM -0400, Jeff Layton wrote: > The first patch in this series adds a new warning that should pop on > kernels have mandatory locking enabled when someone mounts a filesystem > with -o mand. The second patch removes support for mandatory locking > altogether. > > What I think we probably want to do is apply the first to v5.14 before > it ships and allow the new warning to trickle out into stable kernels. > Then we can merge the second patch in v5.15 to go ahead and remove it. > > Sound like a plan? Sounds good to me.--b. > > Jeff Layton (2): > fs: warn about impending deprecation of mandatory locks > fs: remove mandatory file locking support > > .../filesystems/mandatory-locking.rst | 188 -- > fs/9p/vfs_file.c | 12 -- > fs/Kconfig| 10 - > fs/afs/flock.c| 4 - > fs/ceph/locks.c | 3 - > fs/gfs2/file.c| 3 - > fs/locks.c| 116 +-- > fs/namei.c| 4 +- > fs/namespace.c| 31 +-- > fs/nfs/file.c | 4 - > fs/nfsd/nfs4state.c | 13 -- > fs/nfsd/vfs.c | 15 -- > fs/ocfs2/locks.c | 4 - > fs/open.c | 8 +- > fs/read_write.c | 7 - > fs/remap_range.c | 10 - > include/linux/fs.h| 84 > mm/mmap.c | 6 - > mm/nommu.c| 3 - > 19 files changed, 20 insertions(+), 505 deletions(-) > delete mode 100644 Documentation/filesystems/mandatory-locking.rst > > -- > 2.31.1
[Cluster-devel] [PATCH v2 2/2] fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it off in Fedora and RHEL8. Several other distros have followed suit. I've heard of one problem in all that time: Someone migrated from an older distro that supported "-o mand" to one that didn't, and the host had a fstab entry with "mand" in it which broke on reboot. They didn't actually _use_ mandatory locking so they just removed the mount option and moved on. This patch rips out mandatory locking support wholesale from the kernel, along with the Kconfig option and the Documentation file. It also changes the mount code to ignore the "mand" mount option instead of erroring out, and to throw a big, ugly warning. Signed-off-by: Jeff Layton --- .../filesystems/mandatory-locking.rst | 188 -- fs/9p/vfs_file.c | 12 -- fs/Kconfig| 10 - fs/afs/flock.c| 4 - fs/ceph/locks.c | 3 - fs/gfs2/file.c| 3 - fs/locks.c| 116 +-- fs/namei.c| 4 +- fs/namespace.c| 37 ++-- fs/nfs/file.c | 4 - fs/nfsd/nfs4state.c | 13 -- fs/nfsd/vfs.c | 15 -- fs/ocfs2/locks.c | 4 - fs/open.c | 8 +- fs/read_write.c | 7 - fs/remap_range.c | 10 - include/linux/fs.h| 84 mm/mmap.c | 6 - mm/nommu.c| 3 - 19 files changed, 19 insertions(+), 512 deletions(-) delete mode 100644 Documentation/filesystems/mandatory-locking.rst diff --git a/Documentation/filesystems/mandatory-locking.rst b/Documentation/filesystems/mandatory-locking.rst deleted file mode 100644 index 9ce73544a8f0.. --- a/Documentation/filesystems/mandatory-locking.rst +++ /dev/null @@ -1,188 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -= -Mandatory File Locking For The Linux Operating System -= - - Andy Walker - - 15 April 1996 - -(Updated September 2007) - -0. Why you should avoid mandatory locking -- - -The Linux implementation is prey to a number of difficult-to-fix race -conditions which in practice make it not dependable: - - - The write system call checks for a mandatory lock only once - at its start. It is therefore possible for a lock request to - be granted after this check but before the data is modified. - A process may then see file data change even while a mandatory - lock was held. - - Similarly, an exclusive lock may be granted on a file after - the kernel has decided to proceed with a read, but before the - read has actually completed, and the reading process may see - the file data in a state which should not have been visible - to it. - - Similar races make the claimed mutual exclusion between lock - and mmap similarly unreliable. - -1. What is mandatory locking? --- - -Mandatory locking is kernel enforced file locking, as opposed to the more usual -cooperative file locking used to guarantee sequential access to files among -processes. File locks are applied using the flock() and fcntl() system calls -(and the lockf() library routine which is a wrapper around fcntl().) It is -normally a process' responsibility to check for locks on a file it wishes to -update, before applying its own lock, updating the file and unlocking it again. -The most commonly used example of this (and in the case of sendmail, the most -troublesome) is access to a user's mailbox. The mail user agent and the mail -transfer agent must guard against updating the mailbox at the same time, and -prevent reading the mailbox while it is being updated. - -In a perfect world all processes would use and honour a cooperative, or -"advisory" locking scheme. However, the world isn't perfect, and there's -a lot of poorly written code out there. - -In trying to address this problem, the designers of System V UNIX came up -with a "mandatory" locking scheme, whereby the operating system kernel would -block attempts by a process to write to a file that another process holds a -"read" -or- "shared" lock on, and block attempts to both read and write to a -file that a process holds a "write " -or- "exclusive" lock on. - -The System V mandatory locking scheme was intended to have as little impact as -possible on existing user code. The scheme is based on marking
[Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros have disabled it. Warn the stragglers that still use "-o mand" that we'll be dropping support for that mount option. Cc: sta...@vger.kernel.org Signed-off-by: Jeff Layton --- fs/namespace.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index ab4174a3c802..ffab0bb1e649 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) } #ifdef CONFIG_MANDATORY_FILE_LOCKING +static bool warned_mand; static inline bool may_mandlock(void) { + if (!warned_mand) { + warned_mand = true; + pr_warn("==\n"); + pr_warn("WARNING: the mand mount option is being deprecated and\n"); + pr_warn(" will be removed in v5.15!\n"); + pr_warn("==\n"); + } return capable(CAP_SYS_ADMIN); } #else -- 2.31.1
[Cluster-devel] [PATCH v2 0/2] fs: remove support for mandatory locking
The first patch in this series adds a new warning that should pop on kernels have mandatory locking enabled when someone mounts a filesystem with -o mand. The second patch removes support for mandatory locking altogether. What I think we probably want to do is apply the first to v5.14 before it ships and allow the new warning to trickle out into stable kernels. Then we can merge the second patch in v5.15 to go ahead and remove it. Sound like a plan? Jeff Layton (2): fs: warn about impending deprecation of mandatory locks fs: remove mandatory file locking support .../filesystems/mandatory-locking.rst | 188 -- fs/9p/vfs_file.c | 12 -- fs/Kconfig| 10 - fs/afs/flock.c| 4 - fs/ceph/locks.c | 3 - fs/gfs2/file.c| 3 - fs/locks.c| 116 +-- fs/namei.c| 4 +- fs/namespace.c| 31 +-- fs/nfs/file.c | 4 - fs/nfsd/nfs4state.c | 13 -- fs/nfsd/vfs.c | 15 -- fs/ocfs2/locks.c | 4 - fs/open.c | 8 +- fs/read_write.c | 7 - fs/remap_range.c | 10 - include/linux/fs.h| 84 mm/mmap.c | 6 - mm/nommu.c| 3 - 19 files changed, 20 insertions(+), 505 deletions(-) delete mode 100644 Documentation/filesystems/mandatory-locking.rst -- 2.31.1
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
Hi, On Fri, 2021-08-20 at 15:17 +0200, Andreas Gruenbacher wrote: > On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse < > swhit...@redhat.com> wrote: > > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > > > From: Bob Peterson > > > > > > This patch introduces a new HIF_MAY_DEMOTE flag and > > > infrastructure > > > that will allow glocks to be demoted automatically on locking > > > conflicts. > > > When a locking request comes in that isn't compatible with the > > > locking > > > state of a holder and that holder has the HIF_MAY_DEMOTE flag > > > set, the > > > holder will be demoted automatically before the incoming locking > > > request > > > is granted. > > > > I'm not sure I understand what is going on here. When there are > > locking > > conflicts we generate call backs and those result in glock > > demotion. > > There is no need for a flag to indicate that I think, since it is > > the > > default behaviour anyway. Or perhaps the explanation is just a bit > > confusing... > > When a glock has active holders (with the HIF_HOLDER flag set), the > glock won't be demoted to a state incompatible with any of those > holders. > Ok, that is a much clearer explanation of what the patch does. Active holders have always prevented demotions previously. > > > Processes that allow a glock holder to be taken away indicate > > > this by > > > calling gfs2_holder_allow_demote(). When they need the glock > > > again, > > > they call gfs2_holder_disallow_demote() and then they check if > > > the > > > holder is still queued: if it is, they're still holding the > > > glock; if > > > it isn't, they need to re-acquire the glock. > > > > > > This allows processes to hang on to locks that could become part > > > of a > > > cyclic locking dependency. The locks will be given up when a > > > (rare) > > > conflicting locking request occurs, and don't need to be given up > > > prematurely. > > > > This seems backwards to me. We already have the glock layer cache > > the > > locks until they are required by another node. We also have the min > > hold time to make sure that we don't bounce locks too much. So what > > is > > the problem that you are trying to solve here I wonder? > > This solves the problem of faulting in pages during read and write > operations: on the one hand, we want to hold the inode glock across > those operations. On the other hand, those operations may fault in > pages, which may require taking the same or other inode glocks, > directly or indirectly, which can deadlock. > > So before we fault in pages, we indicate with > gfs2_holder_allow_demote(gh) that we can cope if the glock is taken > away from us. After faulting in the pages, we indicate with > gfs2_holder_disallow_demote(gh) that we now actually need the glock > again. At that point, we either still have the glock (i.e., the > holder > is still queued and it has the HIF_HOLDER flag set), or we don't. > > The different kinds of read and write operations differ in how they > handle the latter case: > > * When a buffered read or write loses the inode glock, it returns a > short result. This >prevents torn writes and reading things that have never existed on > disk in that form. > > * When a direct read or write loses the inode glock, it re-acquires > it before resuming >the operation. Direct I/O is not expected to return partial > results > and doesn't provide >any kind of synchronization among processes. > > We could solve this kind of problem in other ways, for example, by > keeping a glock generation number, dropping the glock before faulting > in pages, re-acquiring it afterwards, and checking if the generation > number has changed. This would still be an additional piece of glock > infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag > which uses the existing glock holder infrastructure. This is working towards the "why" but could probably be summarised a bit more. We always used to manage to avoid holding fs locks when copying to/from userspace to avoid these complications. If that is no longer possible then it would be good to document what the new expectations are somewhere suitable in Documentation/filesystems/... just so we make sure it is clear what the new system is, and everyone will be on the same page, Steve.
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
Hi, On Fri, 2021-08-20 at 08:11 -0500, Bob Peterson wrote: > On 8/20/21 4:35 AM, Steven Whitehouse wrote: > > Hi, > > > > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > > > From: Bob Peterson > > > > > > This patch introduces a new HIF_MAY_DEMOTE flag and > > > infrastructure > > > that > > > will allow glocks to be demoted automatically on locking > > > conflicts. > > > When a locking request comes in that isn't compatible with the > > > locking > > > state of a holder and that holder has the HIF_MAY_DEMOTE flag > > > set, > > > the > > > holder will be demoted automatically before the incoming locking > > > request > > > is granted. > > > > > I'm not sure I understand what is going on here. When there are > > locking > > conflicts we generate call backs and those result in glock > > demotion. > > There is no need for a flag to indicate that I think, since it is > > the > > default behaviour anyway. Or perhaps the explanation is just a bit > > confusing... > > I agree that the whole concept and explanation are confusing. > Andreas > and I went through several heated arguments about the symantics, > comments, patch descriptions, etc. We played around with many > different > flag name ideas, etc. We did not agree on the best way to describe > the > whole concept. He didn't like my explanation and I didn't like his. > So > yes, it is confusing. > That seems to be a good reason to take a step back and look at this a bit closer. If we are finding this confusing, then someone else looking at it at a future date, who may not be steeped in GFS2 knowledge is likely to find it almost impossible. So at least the description needs some work here I think, to make it much clearer what the overall aim is. It would be good to start with a statement of the problem that it is trying to solve which Andreas has hinted at in his reply just now, Steve.
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
On Fri, Aug 20, 2021 at 11:35 AM Steven Whitehouse wrote: > On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > > From: Bob Peterson > > > > This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure > > that will allow glocks to be demoted automatically on locking conflicts. > > When a locking request comes in that isn't compatible with the locking > > state of a holder and that holder has the HIF_MAY_DEMOTE flag set, the > > holder will be demoted automatically before the incoming locking request > > is granted. > > I'm not sure I understand what is going on here. When there are locking > conflicts we generate call backs and those result in glock demotion. > There is no need for a flag to indicate that I think, since it is the > default behaviour anyway. Or perhaps the explanation is just a bit > confusing... When a glock has active holders (with the HIF_HOLDER flag set), the glock won't be demoted to a state incompatible with any of those holders. > > Processes that allow a glock holder to be taken away indicate this by > > calling gfs2_holder_allow_demote(). When they need the glock again, > > they call gfs2_holder_disallow_demote() and then they check if the > > holder is still queued: if it is, they're still holding the glock; if > > it isn't, they need to re-acquire the glock. > > > > This allows processes to hang on to locks that could become part of a > > cyclic locking dependency. The locks will be given up when a (rare) > > conflicting locking request occurs, and don't need to be given up > > prematurely. > > This seems backwards to me. We already have the glock layer cache the > locks until they are required by another node. We also have the min > hold time to make sure that we don't bounce locks too much. So what is > the problem that you are trying to solve here I wonder? This solves the problem of faulting in pages during read and write operations: on the one hand, we want to hold the inode glock across those operations. On the other hand, those operations may fault in pages, which may require taking the same or other inode glocks, directly or indirectly, which can deadlock. So before we fault in pages, we indicate with gfs2_holder_allow_demote(gh) that we can cope if the glock is taken away from us. After faulting in the pages, we indicate with gfs2_holder_disallow_demote(gh) that we now actually need the glock again. At that point, we either still have the glock (i.e., the holder is still queued and it has the HIF_HOLDER flag set), or we don't. The different kinds of read and write operations differ in how they handle the latter case: * When a buffered read or write loses the inode glock, it returns a short result. This prevents torn writes and reading things that have never existed on disk in that form. * When a direct read or write loses the inode glock, it re-acquires it before resuming the operation. Direct I/O is not expected to return partial results and doesn't provide any kind of synchronization among processes. We could solve this kind of problem in other ways, for example, by keeping a glock generation number, dropping the glock before faulting in pages, re-acquiring it afterwards, and checking if the generation number has changed. This would still be an additional piece of glock infrastructure, but more heavyweight than the HIF_MAY_DEMOTE flag which uses the existing glock holder infrastructure. > > Signed-off-by: Bob Peterson > > --- > > fs/gfs2/glock.c | 221 +++ > > > > fs/gfs2/glock.h | 20 + > > fs/gfs2/incore.h | 1 + > > 3 files changed, 206 insertions(+), 36 deletions(-) > > > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > > index f24db2ececfb..d1b06a09ce2f 100644 > > --- a/fs/gfs2/glock.c > > +++ b/fs/gfs2/glock.c > > @@ -58,6 +58,7 @@ struct gfs2_glock_iter { > > typedef void (*glock_examiner) (struct gfs2_glock * gl); > > > > static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, > > unsigned int target); > > +static void __gfs2_glock_dq(struct gfs2_holder *gh); > > > > static struct dentry *gfs2_root; > > static struct workqueue_struct *glock_workqueue; > > @@ -197,6 +198,12 @@ static int demote_ok(const struct gfs2_glock > > *gl) > > > > if (gl->gl_state == LM_ST_UNLOCKED) > > return 0; > > + /* > > + * Note that demote_ok is used for the lru process of disposing > > of > > + * glocks. For this purpose, we don't care if the glock's > > holders > > + * have the HIF_MAY_DEMOTE flag set or not. If someone is using > > + * them, don't demote. > > + */ > > if (!list_empty(>gl_holders)) > > return 0; > > if (glops->go_demote_ok) > > @@ -379,7 +386,7 @@ static void do_error(struct gfs2_glock *gl, const > > int ret) > > struct gfs2_holder *gh, *tmp; > > > > list_for_each_entry_safe(gh, tmp, >gl_holders, gh_list) { > > - if (test_bit(HIF_HOLDER, >gh_iflags))
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
On 8/20/21 4:35 AM, Steven Whitehouse wrote: Hi, On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: From: Bob Peterson This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that will allow glocks to be demoted automatically on locking conflicts. When a locking request comes in that isn't compatible with the locking state of a holder and that holder has the HIF_MAY_DEMOTE flag set, the holder will be demoted automatically before the incoming locking request is granted. I'm not sure I understand what is going on here. When there are locking conflicts we generate call backs and those result in glock demotion. There is no need for a flag to indicate that I think, since it is the default behaviour anyway. Or perhaps the explanation is just a bit confusing... I agree that the whole concept and explanation are confusing. Andreas and I went through several heated arguments about the symantics, comments, patch descriptions, etc. We played around with many different flag name ideas, etc. We did not agree on the best way to describe the whole concept. He didn't like my explanation and I didn't like his. So yes, it is confusing. My preferred terminology was "DOD" or "Dequeue On Demand" which makes the concept more understandable to me. So basically a process can say "I need to hold this glock, but for an unknown and possibly lengthy period of time, but please feel free to dequeue it if it's in your way." And bear in mind that several processes may do the same, simultaneously. You can almost think of this as a performance enhancement. This concept allows a process to hold a glock for much longer periods of time, at a lower priority, for example, when gfs2_file_read_iter needs to hold the glock for very long-running iterative reads. The process requesting a holder with "Demote On Demand" must then determine if its holder has been stolen away (dequeued on demand) after its lengthy operation, and therefore needs to pick up the pieces of where it left off in its process. Meanwhile, another process may need to hold the glock. If its requested mode is compatible, say SH and SH, the lock is simply granted with no further delay. If the mode is incompatible, regardless of whether it's on the local node or a different node in the cluster, these longer-term/lower-priority holders may be dequeued or prempted by another request to hold the glock. Note that although these holders are dequeued-on-demand, they are never "uninitted" as part of the process. Nor must they ever be, since they may be on another process's heap. This differs from the normal glock demote process in which the demote bit is set on ("requesting" the glock be demoted) but still needs to block until the holder does its actual dequeue. Processes that allow a glock holder to be taken away indicate this by calling gfs2_holder_allow_demote(). When they need the glock again, they call gfs2_holder_disallow_demote() and then they check if the holder is still queued: if it is, they're still holding the glock; if it isn't, they need to re-acquire the glock. This allows processes to hang on to locks that could become part of a cyclic locking dependency. The locks will be given up when a (rare) conflicting locking request occurs, and don't need to be given up prematurely. This seems backwards to me. We already have the glock layer cache the locks until they are required by another node. We also have the min hold time to make sure that we don't bounce locks too much. So what is the problem that you are trying to solve here I wonder? Again, this is simply allowing premption of lenghy/low-priority holders whereas the normal demote process will only demote when the glock is dequeued after this potentially very-long period of time. The minimum hold time solves a different problem, and Andreas and I talked just yesterday about possibly revisiting how that all works. The problem with minimum hold time is that in many cases the glock state machine does not want to grant new holders if the demote bit is on, so it ends up wasting more time than solving the actual problem. But that's another problem for another day. Regards, Bob Peterson
[Cluster-devel] [PATCH] fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it off in Fedora and RHEL8. Several other distros have followed suit. I've heard of one problem in all that time: Someone migrated from an older distro that supported "-o mand" to one that didn't, and the host had a fstab entry with "mand" in it which broke on reboot. They didn't actually _use_ mandatory locking so they just removed the mount option and moved on. This patch rips out mandatory locking support wholesale from the kernel, along with the Kconfig option and the Documentation file. It also changes the mount code to ignore the "mand" mount option instead of erroring out, and to throw a big, ugly warning. Signed-off-by: Jeff Layton --- .../filesystems/mandatory-locking.rst | 188 -- fs/9p/vfs_file.c | 12 -- fs/Kconfig| 10 - fs/afs/flock.c| 4 - fs/ceph/locks.c | 3 - fs/gfs2/file.c| 3 - fs/locks.c| 116 +-- fs/namei.c| 4 +- fs/namespace.c| 31 +-- fs/nfs/file.c | 4 - fs/nfsd/nfs4state.c | 13 -- fs/nfsd/vfs.c | 15 -- fs/ocfs2/locks.c | 4 - fs/open.c | 8 +- fs/read_write.c | 7 - fs/remap_range.c | 10 - include/linux/fs.h| 84 mm/mmap.c | 6 - mm/nommu.c| 3 - 19 files changed, 20 insertions(+), 505 deletions(-) delete mode 100644 Documentation/filesystems/mandatory-locking.rst Here's a first stab at a wholesale removal patch. The main questions at this point are whether ignorning "mand" from here on out is the right thing to do, and how we should pass that info along to the user. diff --git a/Documentation/filesystems/mandatory-locking.rst b/Documentation/filesystems/mandatory-locking.rst deleted file mode 100644 index 9ce73544a8f0.. --- a/Documentation/filesystems/mandatory-locking.rst +++ /dev/null @@ -1,188 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -= -Mandatory File Locking For The Linux Operating System -= - - Andy Walker - - 15 April 1996 - -(Updated September 2007) - -0. Why you should avoid mandatory locking -- - -The Linux implementation is prey to a number of difficult-to-fix race -conditions which in practice make it not dependable: - - - The write system call checks for a mandatory lock only once - at its start. It is therefore possible for a lock request to - be granted after this check but before the data is modified. - A process may then see file data change even while a mandatory - lock was held. - - Similarly, an exclusive lock may be granted on a file after - the kernel has decided to proceed with a read, but before the - read has actually completed, and the reading process may see - the file data in a state which should not have been visible - to it. - - Similar races make the claimed mutual exclusion between lock - and mmap similarly unreliable. - -1. What is mandatory locking? --- - -Mandatory locking is kernel enforced file locking, as opposed to the more usual -cooperative file locking used to guarantee sequential access to files among -processes. File locks are applied using the flock() and fcntl() system calls -(and the lockf() library routine which is a wrapper around fcntl().) It is -normally a process' responsibility to check for locks on a file it wishes to -update, before applying its own lock, updating the file and unlocking it again. -The most commonly used example of this (and in the case of sendmail, the most -troublesome) is access to a user's mailbox. The mail user agent and the mail -transfer agent must guard against updating the mailbox at the same time, and -prevent reading the mailbox while it is being updated. - -In a perfect world all processes would use and honour a cooperative, or -"advisory" locking scheme. However, the world isn't perfect, and there's -a lot of poorly written code out there. - -In trying to address this problem, the designers of System V UNIX came up -with a "mandatory" locking scheme, whereby the operating system kernel would -block attempts by a process to write to a file that another process holds a -"read" -or- "shared" lock on, and block attempts to both read and write to a
Re: [Cluster-devel] [PATCH v6 10/19] gfs2: Introduce flag for glock holder auto-demotion
Hi, On Thu, 2021-08-19 at 21:40 +0200, Andreas Gruenbacher wrote: > From: Bob Peterson > > This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure > that > will allow glocks to be demoted automatically on locking conflicts. > When a locking request comes in that isn't compatible with the > locking > state of a holder and that holder has the HIF_MAY_DEMOTE flag set, > the > holder will be demoted automatically before the incoming locking > request > is granted. > I'm not sure I understand what is going on here. When there are locking conflicts we generate call backs and those result in glock demotion. There is no need for a flag to indicate that I think, since it is the default behaviour anyway. Or perhaps the explanation is just a bit confusing... > Processes that allow a glock holder to be taken away indicate this by > calling gfs2_holder_allow_demote(). When they need the glock again, > they call gfs2_holder_disallow_demote() and then they check if the > holder is still queued: if it is, they're still holding the glock; if > it > isn't, they need to re-acquire the glock. > > This allows processes to hang on to locks that could become part of a > cyclic locking dependency. The locks will be given up when a (rare) > conflicting locking request occurs, and don't need to be given up > prematurely. This seems backwards to me. We already have the glock layer cache the locks until they are required by another node. We also have the min hold time to make sure that we don't bounce locks too much. So what is the problem that you are trying to solve here I wonder? > > Signed-off-by: Bob Peterson > --- > fs/gfs2/glock.c | 221 +++ > > fs/gfs2/glock.h | 20 + > fs/gfs2/incore.h | 1 + > 3 files changed, 206 insertions(+), 36 deletions(-) > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c > index f24db2ececfb..d1b06a09ce2f 100644 > --- a/fs/gfs2/glock.c > +++ b/fs/gfs2/glock.c > @@ -58,6 +58,7 @@ struct gfs2_glock_iter { > typedef void (*glock_examiner) (struct gfs2_glock * gl); > > static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, > unsigned int target); > +static void __gfs2_glock_dq(struct gfs2_holder *gh); > > static struct dentry *gfs2_root; > static struct workqueue_struct *glock_workqueue; > @@ -197,6 +198,12 @@ static int demote_ok(const struct gfs2_glock > *gl) > > if (gl->gl_state == LM_ST_UNLOCKED) > return 0; > + /* > + * Note that demote_ok is used for the lru process of disposing > of > + * glocks. For this purpose, we don't care if the glock's > holders > + * have the HIF_MAY_DEMOTE flag set or not. If someone is using > + * them, don't demote. > + */ > if (!list_empty(>gl_holders)) > return 0; > if (glops->go_demote_ok) > @@ -379,7 +386,7 @@ static void do_error(struct gfs2_glock *gl, const > int ret) > struct gfs2_holder *gh, *tmp; > > list_for_each_entry_safe(gh, tmp, >gl_holders, gh_list) { > - if (test_bit(HIF_HOLDER, >gh_iflags)) > + if (!test_bit(HIF_WAIT, >gh_iflags)) > continue; > if (ret & LM_OUT_ERROR) > gh->gh_error = -EIO; > @@ -393,6 +400,40 @@ static void do_error(struct gfs2_glock *gl, > const int ret) > } > } > > +/** > + * demote_incompat_holders - demote incompatible demoteable holders > + * @gl: the glock we want to promote > + * @new_gh: the new holder to be promoted > + */ > +static void demote_incompat_holders(struct gfs2_glock *gl, > + struct gfs2_holder *new_gh) > +{ > + struct gfs2_holder *gh; > + > + /* > + * Demote incompatible holders before we make ourselves > eligible. > + * (This holder may or may not allow auto-demoting, but we > don't want > + * to demote the new holder before it's even granted.) > + */ > + list_for_each_entry(gh, >gl_holders, gh_list) { > + /* > + * Since holders are at the front of the list, we stop > when we > + * find the first non-holder. > + */ > + if (!test_bit(HIF_HOLDER, >gh_iflags)) > + return; > + if (test_bit(HIF_MAY_DEMOTE, >gh_iflags) && > + !may_grant(gl, new_gh, gh)) { > + /* > + * We should not recurse into do_promote > because > + * __gfs2_glock_dq only calls handle_callback, > + * gfs2_glock_add_to_lru and > __gfs2_glock_queue_work. > + */ > + __gfs2_glock_dq(gh); > + } > + } > +} > + > /** > * find_first_holder - find the first "holder" gh > * @gl: the glock > @@ -411,6 +452,26 @@ static inline struct gfs2_holder > *find_first_holder(const struct gfs2_glock *gl) > return NULL; > } > > +/** > + * find_first_strong_holder - find
Re: [Cluster-devel] [BUG] fs: dlm: possible ABBA deadlock
On 2021/8/19 23:55, David Teigland wrote: On Thu, Aug 19, 2021 at 04:54:57PM +0800, Jia-Ju Bai wrote: Hello, My static analysis tool reports a possible ABBA deadlock in the dlm filesystem in Linux 5.10: dlm_recover_waiters_pre() mutex_lock(>ls_waiters_mutex); --> line 5130 recover_convert_waiter() _receive_convert_reply() lock_rsb() mutex_lock(>res_mutex); --> line 69 dlm_recover_waiters_post() lock_rsb() mutex_lock(>res_mutex); --> line 69 mutex_lock(>ls_waiters_mutex); --> line 5307 When dlm_recover_waiters_pre() and dlm_recover_waiters_post() are concurrently executed, the deadlock can occur. I am not quite sure whether this possible deadlock is real and how to fix it if it is real. Any feedback would be appreciated, thanks :) They won't be concurrent, "pre" runs before recovery, and "post" is after. Okay, thanks for your reply :) Best wishes, Jia-Ju Bai