Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API
From: Marcelo Ricardo Leitner Sent: 12 August 2015 14:16 Em 12-08-2015 07:23, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 11 August 2015 23:22 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. ... You still need to enable sctp notifications (I think the patch deleted that code). Otherwise you don't get any kind of indication if the remote system 'resets' (ie sends an new INIT chunk) on an existing connection. Right, it would miss the restart event and could generate a corrupted tx/rx buffers by glueing parts of old messages with new ones. Except that it is SCTP so you'd expect DATA chunks to contain entire messages and so get unexpected message sequences rather than corrupt messages. The problem is that the recovery is likely to be another reset. (Particularly with M3UA where the source and destination port numbers are likely to be the same and fixed.) It is probably enough to treat the MSG_NOTIFICATION as a fatal error and close the socket. Just so we are on the same page, you mean that after accepting the new association and enabling notifications on it, any further notification on it can be treated as fatal errors, right? Seems reasonable to me. That's what I had to do. The far end will probably see an additional disconnect, but it shouldn't matter. This is probably a bug in the sctp stack - if a connection is reset but the user hasn't requested notifications then it should be converted to a disconnect indication and a new incoming connection. Maybe in such case resets shouldn't be allowed at all? Because unless it happens on a moment of silence it will always lead to application buffer corruption. Checked the RFCs now but couldn't find anything restricting them to some condition. I certainly expected the 'reset' to cause an inwards abortive disconnect on the old socket and a new indication on the listening socket. I think (hope) that is what you get for a TCP SYN that matches an existing connection. In our case I think they were happening when the remote system was power cycled. David
Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API
From: Marcelo Ricardo Leitner Sent: 11 August 2015 23:22 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. ... You still need to enable sctp notifications (I think the patch deleted that code). Otherwise you don't get any kind of indication if the remote system 'resets' (ie sends an new INIT chunk) on an existing connection. It is probably enough to treat the MSG_NOTIFICATION as a fatal error and close the socket. This is probably a bug in the sctp stack - if a connection is reset but the user hasn't requested notifications then it should be converted to a disconnect indication and a new incoming connection. David
Re: [Cluster-devel] [Ocfs2-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
From: Matthew Wilcox > Sent: 16 May 2020 16:37 ... > > Basically: > > > > This patch sequence (to be written) does the following: > > > > Patch 1: Change __sys_setsockopt() to allocate a kernel buffer, > > copy the data into it then call set_fs(KERNEL_DS). > > An on-stack buffer (say 64 bytes) will be used for > > small transfers. > > > > Patch 2: The same for __sys_getsockopt(). > > > > Patch 3: Compat setsockopt. > > > > Patch 4: Compat getsockopt. > > > > Patch 5: Remove the user copies from the global socket options code. > > > > Patches 6 to n-1; Remove the user copies from the per-protocol code. > > > > Patch n: Remove the set_fs(KERNEL_DS) from the entry points. > > > > This should be bisectable. > > I appreciate your dedication to not publishing the source code to > your kernel module, but Christoph's patch series is actually better. > It's typesafe rather than passing void pointers around. There are plenty on interfaces that pass a 'pointer and length'. Having the compiler do a type check doesn't give any security benefit - just stops silly errors. Oh yes, I've attached the only driver source file that calls into the Linux kernel. You are perfectly free to look at all the thing we have to do to support different and broken kernel releases. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) #ident "@(#) (c) Aculab plc $Header: /home/cvs/repository/ss7/stack/src/driver/linux/ss7osglue.c,v 1.157 2019-08-29 16:09:14 davidla Exp $ $Name: $" #ifndef MODULE #define MODULE #endif #include #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 28) #error minimum kernel version is 2.6.28 #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 34) #include #else #include #endif #include #include #include #include #include #include #include #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0) #include #endif #include #include #include #include #include #include #include #include #include #include #include /* This is only in the kernel build tree */ #include #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 10, 0) #include #else #include /* netinet/sctp.h ought to be this file */ #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 13, 0) #define wait_queue_head __wait_queue_head #define wait_queue_entry __wait_queue #endif #define SK_PROTOCOL(sock) (sock)->sk->sk_protocol extern void ss7_trace_mem(int, void *, int, const char *, ...); extern void ss7_trace_printf(int, const char *, ...); /* Aculab DACP interfaces - these are in aculab's kern_if.h */ void *dacp_symbol_get(const char *); int dacp_symbol_release(const char *); MODULE_AUTHOR("Aculab"); MODULE_LICENSE("Proprietary"); #include "ss7osglue.h" /* Mutex for driver interface code */ static struct mutex ss7_glue_mutex; static int ss7dev_major; static const void *ss7_dtls_handle; static int ss7_use_count; static int ss7_stop_pid; static struct task_struct *asserted_tasks[16]; static unsigned int asserted_task_count; typedef char ss7_verify_const[ SS7_SOCK_STREAM == SOCK_STREAM && SS7_SOCK_SEQPACKET == SOCK_SEQPACKET ? 1 : -1]; static void ss7_net_ns_unload(void); #define TCP_NODELAY 1 static int ss7_glue_open(struct inode *, struct file *); static int ss7_glue_release(struct inode *, struct file *); static long ss7_glue_unlocked_ioctl(struct file *, unsigned int, unsigned long); static unsigned int ss7_glue_poll(struct file *const, poll_table *); static struct file_operations ss7dev_fop = { open: ss7_glue_open, release:ss7_glue_release, unlocked_ioctl: ss7_glue_unlocked_ioctl, compat_ioctl: ss7_glue_unlocked_ioctl, poll: ss7_glue_poll, owner: THIS_MODULE }; static int ss7_reboot_notify(struct notifier_block *nb, unsigned long action, void *data) { /* System being rebooted. * I added this hoping to use it to get the ss7maint daemon to exit, * but it isn't called until all user processes have died. * Leave it here - might be useful one day. */ return 0; } static struct notifier_block ss7_reboot_notifier_block = { .notifier_call = ss7_reboot_notify, }; static int ss7_init_fail(int rval) { if (ss7dev_major > 0) unregister_chrdev(ss7dev_major, "ss7server"); return rval; } static int ss7_init_mod(void) { const void *(*dtls_register)(const char *, int (*)(struct dtls_get_if *)); int rval; ss7_mutex_init(_glue_mutex); printk(KERN_INFO "%s\n", ss7version); ss7dev_major = register_chrdev(0, "ss7server", _fop); if (ss7dev_major < 0) { printk(KERN_INFO "ss7server: register_chrdev() failed: %d\n", ss7dev_major); return ss7_init_fail(ss7dev_major); } rval = ss7_driver_init(); if (rval != 0) { printk(KERN_INFO "ss7server: ss7_driver_init() failed: %d\n", rval); return ss7_init_fail(-EIO); } dtls_register =
Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
From: David Howells > Sent: 15 May 2020 16:20 > Christoph Hellwig wrote: > > > > The advantage on using kernel_setsockopt here is that sctp module will > > > only be loaded if dlm actually creates a SCTP socket. With this > > > change, sctp will be loaded on setups that may not be actually using > > > it. It's a quite big module and might expose the system. > > > > True. Not that the intent is to kill kernel space callers of setsockopt, > > as I plan to remove the set_fs address space override used for it. > > For getsockopt, does it make sense to have the core kernel load optval/optlen > into a buffer before calling the protocol driver? Then the driver need not > see the userspace pointer at all. > > Similar could be done for setsockopt - allocate a buffer of the size requested > by the user inside the kernel and pass it into the driver, then copy the data > back afterwards. Yes, it also simplifies all the compat code. And there is a BPF test in setsockopt that also wants to pass on a kernel buffer. I'm willing to sit and write the patch. Quoting from a post I made later on Friday. Basically: This patch sequence (to be written) does the following: Patch 1: Change __sys_setsockopt() to allocate a kernel buffer, copy the data into it then call set_fs(KERNEL_DS). An on-stack buffer (say 64 bytes) will be used for small transfers. Patch 2: The same for __sys_getsockopt(). Patch 3: Compat setsockopt. Patch 4: Compat getsockopt. Patch 5: Remove the user copies from the global socket options code. Patches 6 to n-1; Remove the user copies from the per-protocol code. Patch n: Remove the set_fs(KERNEL_DS) from the entry points. This should be bisectable. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
From: Christoph Hellwig > Sent: 15 May 2020 16:25 > On Fri, May 15, 2020 at 04:20:02PM +0100, David Howells wrote: > > Christoph Hellwig wrote: > > > > > > The advantage on using kernel_setsockopt here is that sctp module will > > > > only be loaded if dlm actually creates a SCTP socket. With this > > > > change, sctp will be loaded on setups that may not be actually using > > > > it. It's a quite big module and might expose the system. > > > > > > True. Not that the intent is to kill kernel space callers of setsockopt, > > > as I plan to remove the set_fs address space override used for it. > > > > For getsockopt, does it make sense to have the core kernel load > > optval/optlen > > into a buffer before calling the protocol driver? Then the driver need not > > see the userspace pointer at all. > > > > Similar could be done for setsockopt - allocate a buffer of the size > > requested > > by the user inside the kernel and pass it into the driver, then copy the > > data > > back afterwards. > > I did look into that initially. The problem is that tons of sockopts > entirely ignore optlen and just use a fixed size. So I fear that there > could be tons of breakage if we suddently respect it. Otherwise that > would be a pretty nice way to handle the situation. I'd guess that most application use the correct size for setsockopt(). (Well, apart from using 4 instead of 1.) It is certainly possible to always try to read in 64 bytes regardless of the supplied length, but handle the EFAULT case by shortening the buffer. Historically getsockopt() only wrote the length back. Treating 0 and garbage as (say) 4k and letting the protocol code set a shorten the copy to user might work. All short transfers would want to use an on-stack buffer, so slight oversizes could also be allowed for. OTOH if i did a getsockopt() with too short a length I wouldn't want the kernel to trash my program memory. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt
From: Christoph Hellwig > Only for those were we have users, and all those are covered. What do we tell all our users when our kernel SCTP code no longer works? It uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY, SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD and SO_LINGER. We should probably use the CONNECTX function as well. I doubt we are the one company with out-of-tree drivers that use the kernel_socket interface. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr
From: David Laight > Sent: 14 May 2020 13:30 > Subject: RE: [PATCH 32/33] sctp: add sctp_sock_get_primary_addr > > From: David Laight > > Sent: 14 May 2020 10:51 > > From: Marcelo Ricardo Leitner > > > Sent: 13 May 2020 19:03 > > > > > > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote: > > > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel > > > > space without going through a fake uaccess. > > > > > > Same comment as on the other dlm/sctp patch. > > > > Wouldn't it be best to write sctp_[gs]etsockotp() that > > use a kernel buffer and then implement the user-space > > calls using a wrapper that does the copies to an on-stack > > (or malloced if big) buffer. > > Actually looking at __sys_setsockopt() it calls > BPF_CGROUP_RUN_PROG_SETSOCKOPT() which (by the look of it) > can copy the user buffer into malloc()ed memory and > cause set_fs(KERNEL_DS) be called. > > The only way to get rid of that set_fs() is to always > have the buffer in kernel memory when the underlying > setsockopt() code is called. And having started to try coding __sys_setsockopt() and then found the compat code I suspect that would be a whole lot more sane if the buffer was in kernel and it knew that at least (say) 64 bytes were allocated. The whole compat_alloc_user_space() 'crap' could probably go. Actually it looks like an application can avoid whatever checks BPF_CGROUP_RUN_PROG_SETSOCKOPT() is trying to do by using the 32bit compat ioctls. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr
From: Marcelo Ricardo Leitner > Sent: 13 May 2020 19:03 > > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote: > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel > > space without going through a fake uaccess. > > Same comment as on the other dlm/sctp patch. Wouldn't it be best to write sctp_[gs]etsockotp() that use a kernel buffer and then implement the user-space calls using a wrapper that does the copies to an on-stack (or malloced if big) buffer. That will also simplify the code be removing all the copies and -EFAULT returns. Only the size checks will be needed and the code can assume the buffer is at least the size of the on-stack buffer. Our SCTP code uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY, SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD and SO_LINGER. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt
From: 'Christoph Hellwig' > Sent: 14 May 2020 11:35 > On Thu, May 14, 2020 at 10:26:41AM +, David Laight wrote: > > From: Christoph Hellwig > > > Only for those were we have users, and all those are covered. > > > > What do we tell all our users when our kernel SCTP code > > no longer works? > > We only care about in-tree modules, just like for every other interface > in the kernel. Even if our management agreed to release the code and the code layout matched the kernel guidelines you still wouldn't want two large drivers that implement telephony functionality for hardware that very few people actually have. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx
From: Marcelo Ricardo Leitner > Sent: 13 May 2020 19:01 > On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote: > > And call it directly from dlm instead of going through kernel_setsockopt. > > The advantage on using kernel_setsockopt here is that sctp module will > only be loaded if dlm actually creates a SCTP socket. With this > change, sctp will be loaded on setups that may not be actually using > it. It's a quite big module and might expose the system. > > I'm okay with the SCTP changes, but I'll defer to DLM folks to whether > that's too bad or what for DLM. I didn't see these sneak through. There is a big long list of SCTP socket options that are needed to make anything work. They all need exporting. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt
From: Joe Perches > Sent: 13 May 2020 18:39 > On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote: > > this series removes the kernel_setsockopt and kernel_getsockopt > > functions, and instead switches their users to small functions that > > implement setting (or in one case getting) a sockopt directly using > > a normal kernel function call with type safety and all the other > > benefits of not having a function call. > > > > In some cases these functions seem pretty heavy handed as they do > > a lock_sock even for just setting a single variable, but this mirrors > > the real setsockopt implementation - counter to that a few kernel > > drivers just set the fields directly already. > > > > Nevertheless the diffstat looks quite promising: > > > > 42 files changed, 721 insertions(+), 799 deletions(-) I missed this patch going through. Massive NACK. You need to export functions that do most of the socket options for all protocols. As well as REUSADDR and NODELAY SCTP has loads because a lot of stuff that should have been extra system calls got piled into setsockopt. An alternate solution would be to move the copy_to/from_user() into a wrapper function so that the kernel_[sg]etsockopt() functions would bypass them completely. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt
Looking at __sys_setsockopt() I noticed that the BPF intercept can also cause set_fs(KERNEL_DS) be set in order to pass a modified buffer into the actual setsockopt() code. If that functionality is to be kept then the underlying protocol specific code needs changing to accept a kernel buffer. The 32bit compat code would also be a lot simpler if it could pass an kernel buffer through. At the moment it copies the modified buffer back out onto the user stack. I'm sure there have been suggestions to remove that complete hack. Fixing the compat code would leave a kernel_[sg]et_sockopt() that still worked. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr
From: David Laight > Sent: 14 May 2020 10:51 > From: Marcelo Ricardo Leitner > > Sent: 13 May 2020 19:03 > > > > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote: > > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel > > > space without going through a fake uaccess. > > > > Same comment as on the other dlm/sctp patch. > > Wouldn't it be best to write sctp_[gs]etsockotp() that > use a kernel buffer and then implement the user-space > calls using a wrapper that does the copies to an on-stack > (or malloced if big) buffer. Actually looking at __sys_setsockopt() it calls BPF_CGROUP_RUN_PROG_SETSOCKOPT() which (by the look of it) can copy the user buffer into malloc()ed memory and cause set_fs(KERNEL_DS) be called. The only way to get rid of that set_fs() is to always have the buffer in kernel memory when the underlying setsockopt() code is called. The comment above __sys_[sg]etsockopt() about not knowing the length is just wrong. It probably applied to getsockopt() in the dim and distant past before it was made read-update. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 31/33] sctp: add sctp_sock_set_nodelay
From: Christoph Hellwig > Sent: 21 May 2020 09:35 > On Wed, May 20, 2020 at 08:39:13PM -0300, Marcelo Ricardo Leitner wrote: > > On Wed, May 20, 2020 at 04:23:55PM -0700, David Miller wrote: > > > From: Marcelo Ricardo Leitner > > > Date: Wed, 20 May 2020 20:10:01 -0300 > > > > > > > The duplication with sctp_setsockopt_nodelay() is quite silly/bad. > > > > Also, why have the 'true' hardcoded? It's what dlm uses, yes, but the > > > > API could be a bit more complete than that. > > > > > > The APIs are being designed based upon what in-tree users actually > > > make use of. We can expand things later if necessary. > > > > Sometimes expanding things later can be though, thus why the worry. > > But ok, I get it. Thanks. > > > > The comment still applies, though. (re the duplication) > > Where do you see duplication? The whole thing just doesn't scale. As soon as you get to the slightly more complex requests like SCTP_INITMSG (which should probably be called to set the required number of data streams) you've either got replicated code or nested wrappers. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2
From: Christoph Hellwig > Sent: 20 May 2020 20:55 > > this series removes the kernel_setsockopt and kernel_getsockopt > functions, and instead switches their users to small functions that > implement setting (or in one case getting) a sockopt directly using > a normal kernel function call with type safety and all the other > benefits of not having a function call. > > In some cases these functions seem pretty heavy handed as they do > a lock_sock even for just setting a single variable, but this mirrors > the real setsockopt implementation unlike a few drivers that just set > set the fields directly. How much does this increase the kernel code by? You are also replicating a lot of code making it more difficult to maintain. I don't think the performance of an socket option code really matters - it is usually done once when a socket is initialised and the other costs of establishing a connection will dominate. Pulling the user copies outside the [gs]etsocksopt switch statement not only reduces the code size (source and object) and trivially allows kernel_[sg]sockopt() to me added to the list of socket calls. It probably isn't possible to pull the usercopies right out into the syscall wrapper because of some broken requests. I worried about whether getsockopt() should read the entire user buffer first. SCTP needs the some of it often (including a sockaddr_storage in one case), TCP needs it once. However the cost of reading a few words is small, and a big buffer probably needs setting to avoid leaking kernel memory if the structure has holes or fields that don't get set. Reading from userspace solves both issues. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2
From: 'Christoph Hellwig' > Sent: 21 May 2020 10:12 ... > > I worried about whether getsockopt() should read the entire > > user buffer first. SCTP needs the some of it often (including a > > sockaddr_storage in one case), TCP needs it once. > > However the cost of reading a few words is small, and a big > > buffer probably needs setting to avoid leaking kernel > > memory if the structure has holes or fields that don't get set. > > Reading from userspace solves both issues. > > As mention in the thread on the last series: That was my first idea, but > we have way to many sockopts, especially in obscure protocols that just > hard code the size. The chance of breaking userspace in a way that can't > be fixed without going back to passing user pointers to get/setsockopt > is way to high to commit to such a change unfortunately. Right the syscall stubs probably can't do it. But the per-protocol ones can for the main protocols. I posted a patch for SCTP yesterday that removes 800 lines of source and 8k of object code. Even that needs a horrid bodge for one request where the length returned has to be less than the data copied! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 4/4] net: remove kernel_setsockopt
From: Christoph Hellwig > Sent: 29 May 2020 13:10 > > No users left. There is no point even proposing this until all the changes to remove its use have made it at least as far into 'net-next' and probably 'net'. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH 2/4] sctp: refactor sctp_setsockopt_bindx
From: Marcelo Ricardo Leitner > Sent: 29 May 2020 17:06 > On Fri, May 29, 2020 at 02:09:41PM +0200, Christoph Hellwig wrote: > > Split out a sctp_setsockopt_bindx_kernel that takes a kernel pointer > > to the sockaddr and make sctp_setsockopt_bindx a small wrapper around > > it. This prepares for adding a new bind_add proto op. > > > > Signed-off-by: Christoph Hellwig > > Acked-by: Marcelo Ricardo Leitner > > > --- > > net/sctp/socket.c | 61 ++- > > 1 file changed, 28 insertions(+), 33 deletions(-) > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > index 827a9903ee288..6e745ac3c4a59 100644 > > --- a/net/sctp/socket.c > > +++ b/net/sctp/socket.c > > @@ -972,23 +972,22 @@ int sctp_asconf_mgmt(struct sctp_sock *sp, struct > > sctp_sockaddr_entry *addrw) > > * it. > > * > > * skThe sk of the socket > > - * addrs The pointer to the addresses in user land > > + * addrs The pointer to the addresses > > * addrssize Size of the addrs buffer > > * opOperation to perform (add or remove, see the flags of > > * sctp_bindx) > > * > > * Returns 0 if ok, <0 errno code on error. > > */ > > -static int sctp_setsockopt_bindx(struct sock *sk, > > -struct sockaddr __user *addrs, > > -int addrs_size, int op) > > +static int sctp_setsockopt_bindx_kernel(struct sock *sk, const > > + struct sockaddr *addrs, int addrs_size, > > + int op) The list of addresses ought to be 'const'. IIRC that requires the test for 'port == 0' be moved down a few layers. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks
From: Jeff Layton > Sent: 20 August 2021 14:57 > > We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros > have disabled it. Warn the stragglers that still use "-o mand" that > we'll be dropping support for that mount option. > > Cc: sta...@vger.kernel.org > Signed-off-by: Jeff Layton > --- > fs/namespace.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/fs/namespace.c b/fs/namespace.c > index ab4174a3c802..ffab0bb1e649 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1716,8 +1716,16 @@ static inline bool may_mount(void) > } > > #ifdef CONFIG_MANDATORY_FILE_LOCKING > +static bool warned_mand; > static inline bool may_mandlock(void) > { > + if (!warned_mand) { > + warned_mand = true; > + > pr_warn("==\n"); > + pr_warn("WARNING: the mand mount option is being deprecated > and\n"); > + pr_warn(" will be removed in v5.15!\n"); > + > pr_warn("==\n"); > + } > return capable(CAP_SYS_ADMIN); > } If that is called more than once you don't want the 'inline'. I doubt it matters is not inlined - hardly a hot path. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [Cluster-devel] [PATCH v4 1/8] iov_iter: Introduce iov_iter_fault_in_writeable helper
From: Linus Torvalds > Sent: 24 July 2021 20:53 > > On Sat, Jul 24, 2021 at 12:35 PM Andreas Gruenbacher > wrote: > > > > +int iov_iter_fault_in_writeable(const struct iov_iter *i, size_t bytes) > > +{ > ... > > + if (fault_in_user_pages(start, len, true) != len) > > + return -EFAULT; > > Looking at this once more, I think this is likely wrong. > > Why? > > Because any user can/should only care about at least *part* of the > area being writable. > > Imagine that you're doing a large read. If the *first* page is > writable, you should still return the partial read, not -EFAULT. My 2c... Is it actually worth doing any more than ensuring the first byte of the buffer is paged in before entering the block that has to disable page faults? Most of the all the pages are present so the IO completes. The pages can always get unmapped (due to page pressure or another application thread unmapping them) so there needs to be a retry loop. Given the cost of actually faulting in a page going around the outer loop may not matter. Indeed, if an application has just mmap()ed in a very large file and is then doing a write() from it then it is quite likely that the pages got unmapped! Clearly there needs to be extra code to ensure progress is made. This might actually require the use of 'bounce buffers' for really problematic user requests. I also wonder what actually happens for pipes and fifos. IIRC reads and write of up to PIPE_MAX (typically 4096) are expected to be atomic. This should be true even if there are page faults part way through the copy_to/from_user(). It has to be said I can't see any reference to PIPE_MAX in the linux man pages, but I'm sure it is in the POSIX/TOG spec. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)