Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API

2015-08-12 Thread David Laight
From: Marcelo Ricardo Leitner
 Sent: 12 August 2015 14:16
 Em 12-08-2015 07:23, David Laight escreveu:
  From: Marcelo Ricardo Leitner
  Sent: 11 August 2015 23:22
  DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
  needed but this causes it to use sctp_do_peeloff() to mimic an
  kernel_accept() and this causes a symbol dependency on sctp module.
 
  By switching it to 1-to-1 API we can avoid this dependency and also
  reduce quite a lot of SCTP-specific code in lowcomms.c.
  ...
 
  You still need to enable sctp notifications (I think the patch deleted
  that code).
  Otherwise you don't get any kind of indication if the remote system
  'resets' (ie sends an new INIT chunk) on an existing connection.
 
 Right, it would miss the restart event and could generate a corrupted
 tx/rx buffers by glueing parts of old messages with new ones.

Except that it is SCTP so you'd expect DATA chunks to contain entire
messages and so get unexpected message sequences rather than corrupt
messages.
The problem is that the recovery is likely to be another reset.
(Particularly with M3UA where the source and destination port numbers
are likely to be the same and fixed.)

  It is probably enough to treat the MSG_NOTIFICATION as a fatal error
  and close the socket.
 
 Just so we are on the same page, you mean that after accepting the new
 association and enabling notifications on it, any further notification
 on it can be treated as fatal errors, right? Seems reasonable to me.

That's what I had to do.
The far end will probably see an additional disconnect, but it shouldn't
matter.

  This is probably a bug in the sctp stack - if a connection is reset
  but the user hasn't requested notifications then it should be
  converted to a disconnect indication and a new incoming connection.
 
 Maybe in such case resets shouldn't be allowed at all? Because unless it
 happens on a moment of silence it will always lead to application buffer
 corruption. Checked the RFCs now but couldn't find anything restricting
 them to some condition.

I certainly expected the 'reset' to cause an inwards abortive disconnect
on the old socket and a new indication on the listening socket.
I think (hope) that is what you get for a TCP SYN that matches an existing
connection.

In our case I think they were happening when the remote system was power
cycled.

David




Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API

2015-08-12 Thread David Laight
From: Marcelo Ricardo Leitner
 Sent: 11 August 2015 23:22
 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
 needed but this causes it to use sctp_do_peeloff() to mimic an
 kernel_accept() and this causes a symbol dependency on sctp module.
 
 By switching it to 1-to-1 API we can avoid this dependency and also
 reduce quite a lot of SCTP-specific code in lowcomms.c.
...

You still need to enable sctp notifications (I think the patch deleted
that code).
Otherwise you don't get any kind of indication if the remote system
'resets' (ie sends an new INIT chunk) on an existing connection.

It is probably enough to treat the MSG_NOTIFICATION as a fatal error
and close the socket.

This is probably a bug in the sctp stack - if a connection is reset
but the user hasn't requested notifications then it should be
converted to a disconnect indication and a new incoming connection.

David




Re: [Cluster-devel] [Ocfs2-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-17 Thread David Laight
From: Matthew Wilcox
> Sent: 16 May 2020 16:37
...
> > Basically:
> >
> > This patch sequence (to be written) does the following:
> >
> > Patch 1: Change __sys_setsockopt() to allocate a kernel buffer,
> >  copy the data into it then call set_fs(KERNEL_DS).
> >  An on-stack buffer (say 64 bytes) will be used for
> >  small transfers.
> >
> > Patch 2: The same for __sys_getsockopt().
> >
> > Patch 3: Compat setsockopt.
> >
> > Patch 4: Compat getsockopt.
> >
> > Patch 5: Remove the user copies from the global socket options code.
> >
> > Patches 6 to n-1; Remove the user copies from the per-protocol code.
> >
> > Patch n: Remove the set_fs(KERNEL_DS) from the entry points.
> >
> > This should be bisectable.
> 
> I appreciate your dedication to not publishing the source code to
> your kernel module, but Christoph's patch series is actually better.
> It's typesafe rather than passing void pointers around.

There are plenty on interfaces that pass a 'pointer and length'.
Having the compiler do a type check doesn't give any security
benefit - just stops silly errors.

Oh yes, I've attached the only driver source file that calls
into the Linux kernel.
You are perfectly free to look at all the thing we have to do
to support different and broken kernel releases.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
#ident "@(#) (c) Aculab plc $Header: 
/home/cvs/repository/ss7/stack/src/driver/linux/ss7osglue.c,v 1.157 2019-08-29 
16:09:14 davidla Exp $ $Name:  $"
#ifndef MODULE
#define MODULE
#endif

#include 

#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 28)
#error minimum kernel version is 2.6.28
#endif

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 34)
#include 
#else
#include 
#endif

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
#include 
#endif
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

/* This is only in the kernel build tree */
#include 

#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 10, 0)
#include 
#else
#include /* netinet/sctp.h ought to be this file */
#endif

#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 13, 0)
#define wait_queue_head __wait_queue_head
#define wait_queue_entry __wait_queue
#endif

#define SK_PROTOCOL(sock) (sock)->sk->sk_protocol

extern void ss7_trace_mem(int, void *, int, const char *, ...);
extern void ss7_trace_printf(int, const char *, ...);

/* Aculab DACP interfaces - these are in aculab's kern_if.h */
void *dacp_symbol_get(const char *);
int dacp_symbol_release(const char *);

MODULE_AUTHOR("Aculab");
MODULE_LICENSE("Proprietary");

#include "ss7osglue.h"

/* Mutex for driver interface code */
static struct mutex ss7_glue_mutex;

static int ss7dev_major;
static const void *ss7_dtls_handle;
static int ss7_use_count;
static int ss7_stop_pid;

static struct task_struct *asserted_tasks[16];
static unsigned int asserted_task_count;

typedef char ss7_verify_const[ SS7_SOCK_STREAM == SOCK_STREAM && 
SS7_SOCK_SEQPACKET == SOCK_SEQPACKET ? 1 : -1];

static void ss7_net_ns_unload(void);

#define TCP_NODELAY 1

static int ss7_glue_open(struct inode *, struct file *);
static int ss7_glue_release(struct inode *, struct file *);
static long ss7_glue_unlocked_ioctl(struct file *, unsigned int, unsigned long);
static unsigned int ss7_glue_poll(struct file *const, poll_table *);

static struct file_operations ss7dev_fop =
{
open:   ss7_glue_open,
release:ss7_glue_release,
unlocked_ioctl: ss7_glue_unlocked_ioctl,
compat_ioctl:   ss7_glue_unlocked_ioctl,
poll:   ss7_glue_poll,
owner:  THIS_MODULE
};

static int ss7_reboot_notify(struct notifier_block *nb, unsigned long action,
void *data)
{
/* System being rebooted.
 * I added this hoping to use it to get the ss7maint daemon to exit,
 * but it isn't called until all user processes have died.
 * Leave it here - might be useful one day. */
return 0;
}

static struct notifier_block ss7_reboot_notifier_block = {
.notifier_call = ss7_reboot_notify,
};

static int
ss7_init_fail(int rval)
{
if (ss7dev_major > 0)
unregister_chrdev(ss7dev_major, "ss7server");
return rval;
}

static int
ss7_init_mod(void)
{
const void *(*dtls_register)(const char *, int (*)(struct dtls_get_if *));
int rval;

ss7_mutex_init(_glue_mutex);

printk(KERN_INFO "%s\n", ss7version);

ss7dev_major = register_chrdev(0, "ss7server", _fop);

if (ss7dev_major < 0) {
printk(KERN_INFO "ss7server: register_chrdev() failed: %d\n",
ss7dev_major);
return ss7_init_fail(ss7dev_major);
}

rval = ss7_driver_init();
if (rval != 0) {
printk(KERN_INFO "ss7server: ss7_driver_init() failed: %d\n", rval);
return ss7_init_fail(-EIO);
}

dtls_register = 

Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-16 Thread David Laight
From: David Howells
> Sent: 15 May 2020 16:20
> Christoph Hellwig  wrote:
> 
> > > The advantage on using kernel_setsockopt here is that sctp module will
> > > only be loaded if dlm actually creates a SCTP socket.  With this
> > > change, sctp will be loaded on setups that may not be actually using
> > > it. It's a quite big module and might expose the system.
> >
> > True.  Not that the intent is to kill kernel space callers of setsockopt,
> > as I plan to remove the set_fs address space override used for it.
> 
> For getsockopt, does it make sense to have the core kernel load optval/optlen
> into a buffer before calling the protocol driver?  Then the driver need not
> see the userspace pointer at all.
> 
> Similar could be done for setsockopt - allocate a buffer of the size requested
> by the user inside the kernel and pass it into the driver, then copy the data
> back afterwards.

Yes, it also simplifies all the compat code.
And there is a BPF test in setsockopt that also wants to
pass on a kernel buffer.

I'm willing to sit and write the patch.
Quoting from a post I made later on Friday.

Basically:

This patch sequence (to be written) does the following:

Patch 1: Change __sys_setsockopt() to allocate a kernel buffer,
 copy the data into it then call set_fs(KERNEL_DS).
 An on-stack buffer (say 64 bytes) will be used for
 small transfers.

Patch 2: The same for __sys_getsockopt().

Patch 3: Compat setsockopt.

Patch 4: Compat getsockopt.

Patch 5: Remove the user copies from the global socket options code.

Patches 6 to n-1; Remove the user copies from the per-protocol code.

Patch n: Remove the set_fs(KERNEL_DS) from the entry points.

This should be bisectable.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-16 Thread David Laight
From: Christoph Hellwig
> Sent: 15 May 2020 16:25
> On Fri, May 15, 2020 at 04:20:02PM +0100, David Howells wrote:
> > Christoph Hellwig  wrote:
> >
> > > > The advantage on using kernel_setsockopt here is that sctp module will
> > > > only be loaded if dlm actually creates a SCTP socket.  With this
> > > > change, sctp will be loaded on setups that may not be actually using
> > > > it. It's a quite big module and might expose the system.
> > >
> > > True.  Not that the intent is to kill kernel space callers of setsockopt,
> > > as I plan to remove the set_fs address space override used for it.
> >
> > For getsockopt, does it make sense to have the core kernel load 
> > optval/optlen
> > into a buffer before calling the protocol driver?  Then the driver need not
> > see the userspace pointer at all.
> >
> > Similar could be done for setsockopt - allocate a buffer of the size 
> > requested
> > by the user inside the kernel and pass it into the driver, then copy the 
> > data
> > back afterwards.
> 
> I did look into that initially.  The problem is that tons of sockopts
> entirely ignore optlen and just use a fixed size.  So I fear that there
> could be tons of breakage if we suddently respect it.  Otherwise that
> would be a pretty nice way to handle the situation.

I'd guess that most application use the correct size for setsockopt().
(Well, apart from using 4 instead of 1.)

It is certainly possible to always try to read in 64 bytes
regardless of the supplied length, but handle the EFAULT case
by shortening the buffer.

Historically getsockopt() only wrote the length back.
Treating 0 and garbage as (say) 4k and letting the protocol
code set a shorten the copy to user might work.
All short transfers would want to use an on-stack buffer,
so slight oversizes could also be allowed for.

OTOH if i did a getsockopt() with too short a length I wouldn't
want the kernel to trash my program memory.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: Christoph Hellwig
> Only for those were we have users, and all those are covered.

What do we tell all our users when our kernel SCTP code
no longer works?

It uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY,
SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD
and SO_LINGER.
We should probably use the CONNECTX function as well.

I doubt we are the one company with out-of-tree drivers
that use the kernel_socket interface.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr

2020-05-14 Thread David Laight
From: David Laight
> Sent: 14 May 2020 13:30
> Subject: RE: [PATCH 32/33] sctp: add sctp_sock_get_primary_addr
> 
> From: David Laight
> > Sent: 14 May 2020 10:51
> > From: Marcelo Ricardo Leitner
> > > Sent: 13 May 2020 19:03
> > >
> > > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote:
> > > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel
> > > > space without going through a fake uaccess.
> > >
> > > Same comment as on the other dlm/sctp patch.
> >
> > Wouldn't it be best to write sctp_[gs]etsockotp() that
> > use a kernel buffer and then implement the user-space
> > calls using a wrapper that does the copies to an on-stack
> > (or malloced if big) buffer.
> 
> Actually looking at __sys_setsockopt() it calls
> BPF_CGROUP_RUN_PROG_SETSOCKOPT() which (by the look of it)
> can copy the user buffer into malloc()ed memory and
> cause set_fs(KERNEL_DS) be called.
> 
> The only way to get rid of that set_fs() is to always
> have the buffer in kernel memory when the underlying
> setsockopt() code is called.

And having started to try coding __sys_setsockopt()
and then found the compat code I suspect that would
be a whole lot more sane if the buffer was in kernel
and it knew that at least (say) 64 bytes were allocated.

The whole compat_alloc_user_space() 'crap' could probably go.

Actually it looks like an application can avoid whatever
checks BPF_CGROUP_RUN_PROG_SETSOCKOPT() is trying to do
by using the 32bit compat ioctls.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr

2020-05-14 Thread David Laight
From: Marcelo Ricardo Leitner
> Sent: 13 May 2020 19:03
> 
> On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote:
> > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel
> > space without going through a fake uaccess.
> 
> Same comment as on the other dlm/sctp patch.

Wouldn't it be best to write sctp_[gs]etsockotp() that
use a kernel buffer and then implement the user-space
calls using a wrapper that does the copies to an on-stack
(or malloced if big) buffer.

That will also simplify the code be removing all the copies
and -EFAULT returns.
Only the size checks will be needed and the code can assume
the buffer is at least the size of the on-stack buffer.

Our SCTP code uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY,
SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD
and SO_LINGER.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: 'Christoph Hellwig'
> Sent: 14 May 2020 11:35
> On Thu, May 14, 2020 at 10:26:41AM +, David Laight wrote:
> > From: Christoph Hellwig
> > > Only for those were we have users, and all those are covered.
> >
> > What do we tell all our users when our kernel SCTP code
> > no longer works?
> 
> We only care about in-tree modules, just like for every other interface
> in the kernel.

Even if our management agreed to release the code and the code
layout matched the kernel guidelines you still wouldn't want
two large drivers that implement telephony functionality
for hardware that very few people actually have.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-14 Thread David Laight
From: Marcelo Ricardo Leitner
> Sent: 13 May 2020 19:01
> On Wed, May 13, 2020 at 08:26:42AM +0200, Christoph Hellwig wrote:
> > And call it directly from dlm instead of going through kernel_setsockopt.
> 
> The advantage on using kernel_setsockopt here is that sctp module will
> only be loaded if dlm actually creates a SCTP socket.  With this
> change, sctp will be loaded on setups that may not be actually using
> it. It's a quite big module and might expose the system.
> 
> I'm okay with the SCTP changes, but I'll defer to DLM folks to whether
> that's too bad or what for DLM.

I didn't see these sneak through.

There is a big long list of SCTP socket options that are
needed to make anything work.

They all need exporting.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: Joe Perches
> Sent: 13 May 2020 18:39
> On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote:
> > this series removes the kernel_setsockopt and kernel_getsockopt
> > functions, and instead switches their users to small functions that
> > implement setting (or in one case getting) a sockopt directly using
> > a normal kernel function call with type safety and all the other
> > benefits of not having a function call.
> >
> > In some cases these functions seem pretty heavy handed as they do
> > a lock_sock even for just setting a single variable, but this mirrors
> > the real setsockopt implementation - counter to that a few kernel
> > drivers just set the fields directly already.
> >
> > Nevertheless the diffstat looks quite promising:
> >
> >  42 files changed, 721 insertions(+), 799 deletions(-)

I missed this patch going through.
Massive NACK.

You need to export functions that do most of the socket options
for all protocols.
As well as REUSADDR and NODELAY SCTP has loads because a lot
of stuff that should have been extra system calls got piled
into setsockopt.

An alternate solution would be to move the copy_to/from_user()
into a wrapper function so that the kernel_[sg]etsockopt()
functions would bypass them completely.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-15 Thread David Laight
Looking at __sys_setsockopt() I noticed that the BPF intercept
can also cause set_fs(KERNEL_DS) be set in order to pass a
modified buffer into the actual setsockopt() code.

If that functionality is to be kept then the underlying
protocol specific code needs changing to accept a kernel buffer.

The 32bit compat code would also be a lot simpler if it could
pass an kernel buffer through.
At the moment it copies the modified buffer back out onto the
user stack.

I'm sure there have been suggestions to remove that complete hack.
Fixing the compat code would leave a kernel_[sg]et_sockopt() that
still worked.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 32/33] sctp: add sctp_sock_get_primary_addr

2020-05-14 Thread David Laight
From: David Laight
> Sent: 14 May 2020 10:51
> From: Marcelo Ricardo Leitner
> > Sent: 13 May 2020 19:03
> >
> > On Wed, May 13, 2020 at 08:26:47AM +0200, Christoph Hellwig wrote:
> > > Add a helper to directly get the SCTP_PRIMARY_ADDR sockopt from kernel
> > > space without going through a fake uaccess.
> >
> > Same comment as on the other dlm/sctp patch.
> 
> Wouldn't it be best to write sctp_[gs]etsockotp() that
> use a kernel buffer and then implement the user-space
> calls using a wrapper that does the copies to an on-stack
> (or malloced if big) buffer.

Actually looking at __sys_setsockopt() it calls
BPF_CGROUP_RUN_PROG_SETSOCKOPT() which (by the look of it)
can copy the user buffer into malloc()ed memory and
cause set_fs(KERNEL_DS) be called.

The only way to get rid of that set_fs() is to always
have the buffer in kernel memory when the underlying
setsockopt() code is called.

The comment above __sys_[sg]etsockopt() about not knowing
the length is just wrong.
It probably applied to getsockopt() in the dim and distant
past before it was made read-update.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 31/33] sctp: add sctp_sock_set_nodelay

2020-05-21 Thread David Laight
From: Christoph Hellwig
> Sent: 21 May 2020 09:35
> On Wed, May 20, 2020 at 08:39:13PM -0300, Marcelo Ricardo Leitner wrote:
> > On Wed, May 20, 2020 at 04:23:55PM -0700, David Miller wrote:
> > > From: Marcelo Ricardo Leitner 
> > > Date: Wed, 20 May 2020 20:10:01 -0300
> > >
> > > > The duplication with sctp_setsockopt_nodelay() is quite silly/bad.
> > > > Also, why have the 'true' hardcoded? It's what dlm uses, yes, but the
> > > > API could be a bit more complete than that.
> > >
> > > The APIs are being designed based upon what in-tree users actually
> > > make use of.  We can expand things later if necessary.
> >
> > Sometimes expanding things later can be though, thus why the worry.
> > But ok, I get it. Thanks.
> >
> > The comment still applies, though. (re the duplication)
> 
> Where do you see duplication?

The whole thing just doesn't scale.

As soon as you get to the slightly more complex requests
like SCTP_INITMSG (which should probably be called to
set the required number of data streams) you've either
got replicated code or nested wrappers.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-21 Thread David Laight
From: Christoph Hellwig
> Sent: 20 May 2020 20:55
> 
> this series removes the kernel_setsockopt and kernel_getsockopt
> functions, and instead switches their users to small functions that
> implement setting (or in one case getting) a sockopt directly using
> a normal kernel function call with type safety and all the other
> benefits of not having a function call.
> 
> In some cases these functions seem pretty heavy handed as they do
> a lock_sock even for just setting a single variable, but this mirrors
> the real setsockopt implementation unlike a few drivers that just set
> set the fields directly.

How much does this increase the kernel code by?

You are also replicating a lot of code making it more
difficult to maintain.

I don't think the performance of an socket option code
really matters - it is usually done once when a socket
is initialised and the other costs of establishing a
connection will dominate.

Pulling the user copies outside the [gs]etsocksopt switch
statement not only reduces the code size (source and object)
and trivially allows kernel_[sg]sockopt() to me added to
the list of socket calls.

It probably isn't possible to pull the usercopies right
out into the syscall wrapper because of some broken
requests.

I worried about whether getsockopt() should read the entire
user buffer first. SCTP needs the some of it often (including a
sockaddr_storage in one case), TCP needs it once.
However the cost of reading a few words is small, and a big
buffer probably needs setting to avoid leaking kernel
memory if the structure has holes or fields that don't get set.
Reading from userspace solves both issues.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-21 Thread David Laight
From: 'Christoph Hellwig'
> Sent: 21 May 2020 10:12
...
> > I worried about whether getsockopt() should read the entire
> > user buffer first. SCTP needs the some of it often (including a
> > sockaddr_storage in one case), TCP needs it once.
> > However the cost of reading a few words is small, and a big
> > buffer probably needs setting to avoid leaking kernel
> > memory if the structure has holes or fields that don't get set.
> > Reading from userspace solves both issues.
> 
> As mention in the thread on the last series:  That was my first idea, but
> we have way to many sockopts, especially in obscure protocols that just
> hard code the size.  The chance of breaking userspace in a way that can't
> be fixed without going back to passing user pointers to get/setsockopt
> is way to high to commit to such a change unfortunately.

Right the syscall stubs probably can't do it.
But the per-protocol ones can for the main protocols.

I posted a patch for SCTP yesterday that removes 800 lines
of source and 8k of object code.
Even that needs a horrid bodge for one request where the
length returned has to be less than the data copied!

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 4/4] net: remove kernel_setsockopt

2020-05-29 Thread David Laight
From: Christoph Hellwig
> Sent: 29 May 2020 13:10
> 
> No users left.

There is no point even proposing this until all the changes to remove
its use have made it at least as far into 'net-next' and probably 'net'.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH 2/4] sctp: refactor sctp_setsockopt_bindx

2020-06-01 Thread David Laight
From: Marcelo Ricardo Leitner
> Sent: 29 May 2020 17:06
> On Fri, May 29, 2020 at 02:09:41PM +0200, Christoph Hellwig wrote:
> > Split out a sctp_setsockopt_bindx_kernel that takes a kernel pointer
> > to the sockaddr and make sctp_setsockopt_bindx a small wrapper around
> > it.  This prepares for adding a new bind_add proto op.
> >
> > Signed-off-by: Christoph Hellwig 
> 
> Acked-by: Marcelo Ricardo Leitner 
> 
> > ---
> >  net/sctp/socket.c | 61 ++-
> >  1 file changed, 28 insertions(+), 33 deletions(-)
> >
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index 827a9903ee288..6e745ac3c4a59 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -972,23 +972,22 @@ int sctp_asconf_mgmt(struct sctp_sock *sp, struct 
> > sctp_sockaddr_entry *addrw)
> >   * it.
> >   *
> >   * skThe sk of the socket
> > - * addrs The pointer to the addresses in user land
> > + * addrs The pointer to the addresses
> >   * addrssize Size of the addrs buffer
> >   * opOperation to perform (add or remove, see the flags of
> >   *   sctp_bindx)
> >   *
> >   * Returns 0 if ok, <0 errno code on error.
> >   */
> > -static int sctp_setsockopt_bindx(struct sock *sk,
> > -struct sockaddr __user *addrs,
> > -int addrs_size, int op)
> > +static int sctp_setsockopt_bindx_kernel(struct sock *sk,
const
> > +   struct sockaddr *addrs, int addrs_size,
> > +   int op)

The list of addresses ought to be 'const'.

IIRC that requires the test for 'port == 0' be moved down  a few layers.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH v2 1/2] fs: warn about impending deprecation of mandatory locks

2021-08-20 Thread David Laight
From: Jeff Layton
> Sent: 20 August 2021 14:57
> 
> We've had CONFIG_MANDATORY_FILE_LOCKING since 2015 and a lot of distros
> have disabled it. Warn the stragglers that still use "-o mand" that
> we'll be dropping support for that mount option.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Jeff Layton 
> ---
>  fs/namespace.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ab4174a3c802..ffab0bb1e649 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1716,8 +1716,16 @@ static inline bool may_mount(void)
>  }
> 
>  #ifdef   CONFIG_MANDATORY_FILE_LOCKING
> +static bool warned_mand;
>  static inline bool may_mandlock(void)
>  {
> + if (!warned_mand) {
> + warned_mand = true;
> + 
> pr_warn("==\n");
> + pr_warn("WARNING: the mand mount option is being deprecated 
> and\n");
> + pr_warn(" will be removed in v5.15!\n");
> + 
> pr_warn("==\n");
> + }
>   return capable(CAP_SYS_ADMIN);
>  }

If that is called more than once you don't want the 'inline'.
I doubt it matters is not inlined - hardly a hot path.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] [PATCH v4 1/8] iov_iter: Introduce iov_iter_fault_in_writeable helper

2021-07-27 Thread David Laight
From: Linus Torvalds
> Sent: 24 July 2021 20:53
> 
> On Sat, Jul 24, 2021 at 12:35 PM Andreas Gruenbacher
>  wrote:
> >
> > +int iov_iter_fault_in_writeable(const struct iov_iter *i, size_t bytes)
> > +{
> ...
> > +   if (fault_in_user_pages(start, len, true) != len)
> > +   return -EFAULT;
> 
> Looking at this once more, I think this is likely wrong.
> 
> Why?
> 
> Because any user can/should only care about at least *part* of the
> area being writable.
> 
> Imagine that you're doing a large read. If the *first* page is
> writable, you should still return the partial read, not -EFAULT.

My 2c...

Is it actually worth doing any more than ensuring the first byte
of the buffer is paged in before entering the block that has
to disable page faults?

Most of the all the pages are present so the IO completes.

The pages can always get unmapped (due to page pressure or
another application thread unmapping them) so there needs
to be a retry loop.
Given the cost of actually faulting in a page going around
the outer loop may not matter.
Indeed, if an application has just mmap()ed in a very large
file and is then doing a write() from it then it is quite
likely that the pages got unmapped!

Clearly there needs to be extra code to ensure progress is made.
This might actually require the use of 'bounce buffers'
for really problematic user requests.

I also wonder what actually happens for pipes and fifos.
IIRC reads and write of up to PIPE_MAX (typically 4096)
are expected to be atomic.
This should be true even if there are page faults part way
through the copy_to/from_user().

It has to be said I can't see any reference to PIPE_MAX
in the linux man pages, but I'm sure it is in the POSIX/TOG
spec.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)