[Cluster-devel] [GFS2 PATCH] gfs2: Fix gfs2_ail_empty_gl
Hi, Function gfs2_ail_empty_gl is part of the sync process (go_sync) for a glock. The go_sync glop is called to sync the metadata for a glock so we can, in good conscience, tell dlm to release the lock to another node. Its goal is not only to sync the metadata for the glock, but also to ensure all the glock's revokes are written properly as well. The sync is accomplished by first calling a log_flush. Then it calls function gfs2_ail_empty_gl to make sure all the revokes are written for the glock. Before this patch, function gfs2_ail_empty_gl was not doing that correctly. It was checking for NEW items that needed to be revoked (gl_ail_count) and if there weren't any, it was exiting. But the log flush could have introduced new items on the sdp revokes list, so they're still pending, but still accounted for in gl->gl_revokes. This patch rearranges function gfs2_ail_empty_gl so that it first processes new revokes for its ail list, then follows through with its second log flush (for revokes both new and old) and then calls log_flush_wait to make sure the revokes are written back. Signed-off-by: Bob Peterson --- fs/gfs2/glops.c | 68 + 1 file changed, 25 insertions(+), 43 deletions(-) diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 9406e46c31fc..cdd48f52d6b3 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -90,56 +90,38 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl) struct gfs2_sbd *sdp = gl->gl_name.ln_sbd; struct gfs2_trans tr; int ret; + bool need_flush = false; memset(, 0, sizeof(tr)); INIT_LIST_HEAD(_buf); INIT_LIST_HEAD(_databuf); tr.tr_revokes = atomic_read(>gl_ail_count); - if (!tr.tr_revokes) { - bool have_revokes; - bool log_in_flight; - - /* -* We have nothing on the ail, but there could be revokes on -* the sdp revoke queue, in which case, we still want to flush -* the log and wait for it to finish. -* -* If the sdp revoke list is empty too, we might still have an -* io outstanding for writing revokes, so we should wait for -* it before returning. -* -* If none of these conditions are true, our revokes are all -* flushed and we can return. -*/ - gfs2_log_lock(sdp); - have_revokes = !list_empty(>sd_log_revokes); - log_in_flight = atomic_read(>sd_log_in_flight); - gfs2_log_unlock(sdp); - if (have_revokes) - goto flush; - if (log_in_flight) - log_flush_wait(sdp); - return 0; + if (tr.tr_revokes) { + /* A shortened, inline version of gfs2_trans_begin() +* tr->alloced is not set since the transaction structure is +* on the stack */ + tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, +sizeof(u64)); + tr.tr_ip = _RET_IP_; + ret = gfs2_log_reserve(sdp, tr.tr_reserved); + if (ret < 0) + return ret; + WARN_ON_ONCE(current->journal_info); + current->journal_info = + + __gfs2_ail_flush(gl, 0, tr.tr_revokes); + + gfs2_trans_end(sdp); + need_flush = true; + } else if (atomic_read(>gl_revokes)) { + need_flush = true; + } + if (need_flush) { + gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL | + GFS2_LFC_AIL_EMPTY_GL); + log_flush_wait(sdp); } - - /* A shortened, inline version of gfs2_trans_begin() - * tr->alloced is not set since the transaction structure is - * on the stack */ - tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, sizeof(u64)); - tr.tr_ip = _RET_IP_; - ret = gfs2_log_reserve(sdp, tr.tr_reserved); - if (ret < 0) - return ret; - WARN_ON_ONCE(current->journal_info); - current->journal_info = - - __gfs2_ail_flush(gl, 0, tr.tr_revokes); - - gfs2_trans_end(sdp); -flush: - gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL | - GFS2_LFC_AIL_EMPTY_GL); return 0; }
Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
On Fri, May 15, 2020 at 04:20:02PM +0100, David Howells wrote: > Christoph Hellwig wrote: > > > > The advantage on using kernel_setsockopt here is that sctp module will > > > only be loaded if dlm actually creates a SCTP socket. With this > > > change, sctp will be loaded on setups that may not be actually using > > > it. It's a quite big module and might expose the system. > > > > True. Not that the intent is to kill kernel space callers of setsockopt, > > as I plan to remove the set_fs address space override used for it. > > For getsockopt, does it make sense to have the core kernel load optval/optlen > into a buffer before calling the protocol driver? Then the driver need not > see the userspace pointer at all. > > Similar could be done for setsockopt - allocate a buffer of the size requested > by the user inside the kernel and pass it into the driver, then copy the data > back afterwards. I did look into that initially. The problem is that tons of sockopts entirely ignore optlen and just use a fixed size. So I fear that there could be tons of breakage if we suddently respect it. Otherwise that would be a pretty nice way to handle the situation.
Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
Christoph Hellwig wrote: > > The advantage on using kernel_setsockopt here is that sctp module will > > only be loaded if dlm actually creates a SCTP socket. With this > > change, sctp will be loaded on setups that may not be actually using > > it. It's a quite big module and might expose the system. > > True. Not that the intent is to kill kernel space callers of setsockopt, > as I plan to remove the set_fs address space override used for it. For getsockopt, does it make sense to have the core kernel load optval/optlen into a buffer before calling the protocol driver? Then the driver need not see the userspace pointer at all. Similar could be done for setsockopt - allocate a buffer of the size requested by the user inside the kernel and pass it into the driver, then copy the data back afterwards. David
Re: [Cluster-devel] [PATCH 21/33] ipv4: add ip_sock_set_mtu_discover
Christoph Hellwig wrote: > > > + ip_sock_set_mtu_discover(conn->params.local->socket->sk, > > > + IP_PMTUDISC_DONT); > > > > Um... The socket in question could be an AF_INET6 socket, not an AF_INET4 > > socket - I presume it will work in that case. If so: > > Yes, the implementation of that sockopt, including the inet_sock > structure where these options are set is shared between ipv4 and ipv6. Great! Could you note that either in the patch description or in the kerneldoc attached to the function? Thanks, David
Re: [Cluster-devel] [PATCH 29/33] rxrpc_sock_set_min_security_level
Christoph Hellwig wrote: > > Looks good - but you do need to add this to > > Documentation/networking/rxrpc.txt > > also, thanks. > > That file doesn't exist, instead we now have a > cumentation/networking/rxrpc.rst in weird markup. Yeah - that's only in net/next thus far. > Where do you want this to be added, and with what text? Remember I don't > really know what this thing does, I just provide a shortcut. The document itself describes what each rxrpc sockopt does. Just look for RXRPC_MIN_SECURITY_LEVEL in there;-) Anyway, see the attached. This also fixes a couple of errors in the doc that I noticed. David --- diff --git a/Documentation/networking/rxrpc.rst b/Documentation/networking/rxrpc.rst index 5ad35113d0f4..68552b92dc44 100644 --- a/Documentation/networking/rxrpc.rst +++ b/Documentation/networking/rxrpc.rst @@ -477,7 +477,7 @@ AF_RXRPC sockets support a few socket options at the SOL_RXRPC level: Encrypted checksum plus packet padded and first eight bytes of packet encrypted - which includes the actual packet length. - (c) RXRPC_SECURITY_ENCRYPTED + (c) RXRPC_SECURITY_ENCRYPT Encrypted checksum plus entire packet padded and encrypted, including actual packet length. @@ -578,7 +578,7 @@ A client would issue an operation by: This issues a request_key() to get the key representing the security context. The minimum security level can be set:: - unsigned int sec = RXRPC_SECURITY_ENCRYPTED; + unsigned int sec = RXRPC_SECURITY_ENCRYPT; setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL, , sizeof(sec)); @@ -1090,6 +1090,15 @@ The kernel interface functions are as follows: jiffies). In the event of the timeout occurring, the call will be aborted and -ETIME or -ETIMEDOUT will be returned. + (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the + kernel:: + + int rxrpc_sock_set_min_security_level(struct sock *sk, +unsigned int val); + + This specifies the minimum security level required for calls on this + socket. + Configurable Parameters === diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 7dfcbd58da85..e313dae01674 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -57,7 +57,7 @@ int afs_open_socket(struct afs_net *net) srx.transport.sin6.sin6_port= htons(AFS_CM_PORT); ret = rxrpc_sock_set_min_security_level(socket->sk, - RXRPC_SECURITY_ENCRYPT); + RXRPC_SECURITY_ENCRYPT); if (ret < 0) goto error_2;
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt
Looking at __sys_setsockopt() I noticed that the BPF intercept can also cause set_fs(KERNEL_DS) be set in order to pass a modified buffer into the actual setsockopt() code. If that functionality is to be kept then the underlying protocol specific code needs changing to accept a kernel buffer. The 32bit compat code would also be a lot simpler if it could pass an kernel buffer through. At the moment it copies the modified buffer back out onto the user stack. I'm sure there have been suggestions to remove that complete hack. Fixing the compat code would leave a kernel_[sg]et_sockopt() that still worked. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)