[Cluster-devel] [GFS2 PATCH] gfs2: Fix gfs2_ail_empty_gl

2020-05-15 Thread Bob Peterson
Hi,

Function gfs2_ail_empty_gl is part of the sync process (go_sync) for
a glock. The go_sync glop is called to sync the metadata for a glock
so we can, in good conscience, tell dlm to release the lock to another
node. Its goal is not only to sync the metadata for the glock, but
also to ensure all the glock's revokes are written properly as well.

The sync is accomplished by first calling a log_flush. Then it calls
function gfs2_ail_empty_gl to make sure all the revokes are written
for the glock.

Before this patch, function gfs2_ail_empty_gl was not doing that
correctly. It was checking for NEW items that needed to be revoked
(gl_ail_count) and if there weren't any, it was exiting. But the
log flush could have introduced new items on the sdp revokes list,
so they're still pending, but still accounted for in gl->gl_revokes.

This patch rearranges function gfs2_ail_empty_gl so that it first
processes new revokes for its ail list, then follows through with
its second log flush (for revokes both new and old) and then calls
log_flush_wait to make sure the revokes are written back.

Signed-off-by: Bob Peterson 
---
 fs/gfs2/glops.c | 68 +
 1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 9406e46c31fc..cdd48f52d6b3 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -90,56 +90,38 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
struct gfs2_trans tr;
int ret;
+   bool need_flush = false;
 
memset(, 0, sizeof(tr));
INIT_LIST_HEAD(_buf);
INIT_LIST_HEAD(_databuf);
tr.tr_revokes = atomic_read(>gl_ail_count);
 
-   if (!tr.tr_revokes) {
-   bool have_revokes;
-   bool log_in_flight;
-
-   /*
-* We have nothing on the ail, but there could be revokes on
-* the sdp revoke queue, in which case, we still want to flush
-* the log and wait for it to finish.
-*
-* If the sdp revoke list is empty too, we might still have an
-* io outstanding for writing revokes, so we should wait for
-* it before returning.
-*
-* If none of these conditions are true, our revokes are all
-* flushed and we can return.
-*/
-   gfs2_log_lock(sdp);
-   have_revokes = !list_empty(>sd_log_revokes);
-   log_in_flight = atomic_read(>sd_log_in_flight);
-   gfs2_log_unlock(sdp);
-   if (have_revokes)
-   goto flush;
-   if (log_in_flight)
-   log_flush_wait(sdp);
-   return 0;
+   if (tr.tr_revokes) {
+   /* A shortened, inline version of gfs2_trans_begin()
+* tr->alloced is not set since the transaction structure is
+* on the stack */
+   tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes,
+sizeof(u64));
+   tr.tr_ip = _RET_IP_;
+   ret = gfs2_log_reserve(sdp, tr.tr_reserved);
+   if (ret < 0)
+   return ret;
+   WARN_ON_ONCE(current->journal_info);
+   current->journal_info = 
+
+   __gfs2_ail_flush(gl, 0, tr.tr_revokes);
+
+   gfs2_trans_end(sdp);
+   need_flush = true;
+   } else if (atomic_read(>gl_revokes)) {
+   need_flush = true;
+   }
+   if (need_flush) {
+   gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
+  GFS2_LFC_AIL_EMPTY_GL);
+   log_flush_wait(sdp);
}
-
-   /* A shortened, inline version of gfs2_trans_begin()
- * tr->alloced is not set since the transaction structure is
- * on the stack */
-   tr.tr_reserved = 1 + gfs2_struct2blk(sdp, tr.tr_revokes, sizeof(u64));
-   tr.tr_ip = _RET_IP_;
-   ret = gfs2_log_reserve(sdp, tr.tr_reserved);
-   if (ret < 0)
-   return ret;
-   WARN_ON_ONCE(current->journal_info);
-   current->journal_info = 
-
-   __gfs2_ail_flush(gl, 0, tr.tr_revokes);
-
-   gfs2_trans_end(sdp);
-flush:
-   gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
-  GFS2_LFC_AIL_EMPTY_GL);
return 0;
 }
 



Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-15 Thread Christoph Hellwig
On Fri, May 15, 2020 at 04:20:02PM +0100, David Howells wrote:
> Christoph Hellwig  wrote:
> 
> > > The advantage on using kernel_setsockopt here is that sctp module will
> > > only be loaded if dlm actually creates a SCTP socket.  With this
> > > change, sctp will be loaded on setups that may not be actually using
> > > it. It's a quite big module and might expose the system.
> > 
> > True.  Not that the intent is to kill kernel space callers of setsockopt,
> > as I plan to remove the set_fs address space override used for it.
> 
> For getsockopt, does it make sense to have the core kernel load optval/optlen
> into a buffer before calling the protocol driver?  Then the driver need not
> see the userspace pointer at all.
> 
> Similar could be done for setsockopt - allocate a buffer of the size requested
> by the user inside the kernel and pass it into the driver, then copy the data
> back afterwards.

I did look into that initially.  The problem is that tons of sockopts
entirely ignore optlen and just use a fixed size.  So I fear that there
could be tons of breakage if we suddently respect it.  Otherwise that
would be a pretty nice way to handle the situation.



Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > The advantage on using kernel_setsockopt here is that sctp module will
> > only be loaded if dlm actually creates a SCTP socket.  With this
> > change, sctp will be loaded on setups that may not be actually using
> > it. It's a quite big module and might expose the system.
> 
> True.  Not that the intent is to kill kernel space callers of setsockopt,
> as I plan to remove the set_fs address space override used for it.

For getsockopt, does it make sense to have the core kernel load optval/optlen
into a buffer before calling the protocol driver?  Then the driver need not
see the userspace pointer at all.

Similar could be done for setsockopt - allocate a buffer of the size requested
by the user inside the kernel and pass it into the driver, then copy the data
back afterwards.

David



Re: [Cluster-devel] [PATCH 21/33] ipv4: add ip_sock_set_mtu_discover

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > > + ip_sock_set_mtu_discover(conn->params.local->socket->sk,
> > > + IP_PMTUDISC_DONT);
> > 
> > Um... The socket in question could be an AF_INET6 socket, not an AF_INET4
> > socket - I presume it will work in that case.  If so:
> 
> Yes, the implementation of that sockopt, including the inet_sock
> structure where these options are set is shared between ipv4 and ipv6.

Great!  Could you note that either in the patch description or in the
kerneldoc attached to the function?

Thanks,
David



Re: [Cluster-devel] [PATCH 29/33] rxrpc_sock_set_min_security_level

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > Looks good - but you do need to add this to 
> > Documentation/networking/rxrpc.txt
> > also, thanks.
> 
> That file doesn't exist, instead we now have a
> cumentation/networking/rxrpc.rst in weird markup.

Yeah - that's only in net/next thus far.

> Where do you want this to be added, and with what text?  Remember I don't
> really know what this thing does, I just provide a shortcut.

The document itself describes what each rxrpc sockopt does.  Just look for
RXRPC_MIN_SECURITY_LEVEL in there;-)

Anyway, see the attached.  This also fixes a couple of errors in the doc that
I noticed.

David
---
diff --git a/Documentation/networking/rxrpc.rst 
b/Documentation/networking/rxrpc.rst
index 5ad35113d0f4..68552b92dc44 100644
--- a/Documentation/networking/rxrpc.rst
+++ b/Documentation/networking/rxrpc.rst
@@ -477,7 +477,7 @@ AF_RXRPC sockets support a few socket options at the 
SOL_RXRPC level:
 Encrypted checksum plus packet padded and first eight bytes of packet
 encrypted - which includes the actual packet length.
 
- (c) RXRPC_SECURITY_ENCRYPTED
+ (c) RXRPC_SECURITY_ENCRYPT
 
 Encrypted checksum plus entire packet padded and encrypted, including
 actual packet length.
@@ -578,7 +578,7 @@ A client would issue an operation by:
  This issues a request_key() to get the key representing the security
  context.  The minimum security level can be set::
 
-   unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
+   unsigned int sec = RXRPC_SECURITY_ENCRYPT;
setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
   , sizeof(sec));
 
@@ -1090,6 +1090,15 @@ The kernel interface functions are as follows:
  jiffies).  In the event of the timeout occurring, the call will be
  aborted and -ETIME or -ETIMEDOUT will be returned.
 
+ (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the
+ kernel::
+
+   int rxrpc_sock_set_min_security_level(struct sock *sk,
+unsigned int val);
+
+ This specifies the minimum security level required for calls on this
+ socket.
+
 
 Configurable Parameters
 ===
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 7dfcbd58da85..e313dae01674 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -57,7 +57,7 @@ int afs_open_socket(struct afs_net *net)
srx.transport.sin6.sin6_port= htons(AFS_CM_PORT);
 
ret = rxrpc_sock_set_min_security_level(socket->sk,
-   RXRPC_SECURITY_ENCRYPT);
+   RXRPC_SECURITY_ENCRYPT);
if (ret < 0)
goto error_2;
 



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-15 Thread David Laight
Looking at __sys_setsockopt() I noticed that the BPF intercept
can also cause set_fs(KERNEL_DS) be set in order to pass a
modified buffer into the actual setsockopt() code.

If that functionality is to be kept then the underlying
protocol specific code needs changing to accept a kernel buffer.

The 32bit compat code would also be a lot simpler if it could
pass an kernel buffer through.
At the moment it copies the modified buffer back out onto the
user stack.

I'm sure there have been suggestions to remove that complete hack.
Fixing the compat code would leave a kernel_[sg]et_sockopt() that
still worked.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)