Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-23 Thread Christoph Hellwig
On Wed, May 20, 2020 at 09:54:36PM +0200, Christoph Hellwig wrote:
> Hi Dave,
> 
> this series removes the kernel_setsockopt and kernel_getsockopt
> functions, and instead switches their users to small functions that
> implement setting (or in one case getting) a sockopt directly using
> a normal kernel function call with type safety and all the other
> benefits of not having a function call.
> 
> In some cases these functions seem pretty heavy handed as they do
> a lock_sock even for just setting a single variable, but this mirrors
> the real setsockopt implementation unlike a few drivers that just set
> set the fields directly.

Hi Dave and other maintainers,

can you take a look at and potentially merge patches 1-30 while we
discuss the sctp refactoring?  It would get a nice headstart by removing
kernel_getsockopt and most kernel_setsockopt users, and for the next
follow on I wouldn't need to spam lots of lists with 30+ patches again.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-21 Thread David Laight
From: 'Christoph Hellwig'
> Sent: 21 May 2020 10:12
...
> > I worried about whether getsockopt() should read the entire
> > user buffer first. SCTP needs the some of it often (including a
> > sockaddr_storage in one case), TCP needs it once.
> > However the cost of reading a few words is small, and a big
> > buffer probably needs setting to avoid leaking kernel
> > memory if the structure has holes or fields that don't get set.
> > Reading from userspace solves both issues.
> 
> As mention in the thread on the last series:  That was my first idea, but
> we have way to many sockopts, especially in obscure protocols that just
> hard code the size.  The chance of breaking userspace in a way that can't
> be fixed without going back to passing user pointers to get/setsockopt
> is way to high to commit to such a change unfortunately.

Right the syscall stubs probably can't do it.
But the per-protocol ones can for the main protocols.

I posted a patch for SCTP yesterday that removes 800 lines
of source and 8k of object code.
Even that needs a horrid bodge for one request where the
length returned has to be less than the data copied!

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-21 Thread 'Christoph Hellwig'
On Thu, May 21, 2020 at 08:01:33AM +, David Laight wrote:
> How much does this increase the kernel code by?

 44 files changed, 660 insertions(+), 843 deletions(-)


> You are also replicating a lot of code making it more
> difficult to maintain.

No, I specifically don't.

> I don't think the performance of an socket option code
> really matters - it is usually done once when a socket
> is initialised and the other costs of establishing a
> connection will dominate.
> 
> Pulling the user copies outside the [gs]etsocksopt switch
> statement not only reduces the code size (source and object)
> and trivially allows kernel_[sg]sockopt() to me added to
> the list of socket calls.
> 
> It probably isn't possible to pull the usercopies right
> out into the syscall wrapper because of some broken
> requests.

Please read through the previous discussion of the rationale and the
options.  We've been there before.

> I worried about whether getsockopt() should read the entire
> user buffer first. SCTP needs the some of it often (including a
> sockaddr_storage in one case), TCP needs it once.
> However the cost of reading a few words is small, and a big
> buffer probably needs setting to avoid leaking kernel
> memory if the structure has holes or fields that don't get set.
> Reading from userspace solves both issues.

As mention in the thread on the last series:  That was my first idea, but
we have way to many sockopts, especially in obscure protocols that just
hard code the size.  The chance of breaking userspace in a way that can't
be fixed without going back to passing user pointers to get/setsockopt
is way to high to commit to such a change unfortunately.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt v2

2020-05-21 Thread David Laight
From: Christoph Hellwig
> Sent: 20 May 2020 20:55
> 
> this series removes the kernel_setsockopt and kernel_getsockopt
> functions, and instead switches their users to small functions that
> implement setting (or in one case getting) a sockopt directly using
> a normal kernel function call with type safety and all the other
> benefits of not having a function call.
> 
> In some cases these functions seem pretty heavy handed as they do
> a lock_sock even for just setting a single variable, but this mirrors
> the real setsockopt implementation unlike a few drivers that just set
> set the fields directly.

How much does this increase the kernel code by?

You are also replicating a lot of code making it more
difficult to maintain.

I don't think the performance of an socket option code
really matters - it is usually done once when a socket
is initialised and the other costs of establishing a
connection will dominate.

Pulling the user copies outside the [gs]etsocksopt switch
statement not only reduces the code size (source and object)
and trivially allows kernel_[sg]sockopt() to me added to
the list of socket calls.

It probably isn't possible to pull the usercopies right
out into the syscall wrapper because of some broken
requests.

I worried about whether getsockopt() should read the entire
user buffer first. SCTP needs the some of it often (including a
sockaddr_storage in one case), TCP needs it once.
However the cost of reading a few words is small, and a big
buffer probably needs setting to avoid leaking kernel
memory if the structure has holes or fields that don't get set.
Reading from userspace solves both issues.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-15 Thread David Laight
Looking at __sys_setsockopt() I noticed that the BPF intercept
can also cause set_fs(KERNEL_DS) be set in order to pass a
modified buffer into the actual setsockopt() code.

If that functionality is to be kept then the underlying
protocol specific code needs changing to accept a kernel buffer.

The 32bit compat code would also be a lot simpler if it could
pass an kernel buffer through.
At the moment it copies the modified buffer back out onto the
user stack.

I'm sure there have been suggestions to remove that complete hack.
Fixing the compat code would leave a kernel_[sg]et_sockopt() that
still worked.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Miller
From: David Laight 
Date: Thu, 14 May 2020 10:26:41 +

> I doubt we are the one company with out-of-tree drivers
> that use the kernel_socket interface.

Not our problem.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Miller
From: David Laight 
Date: Thu, 14 May 2020 08:29:30 +

> You need to export functions that do most of the socket options
> for all protocols.

If all in-tree users of this stuff are converted, there is no argument
for keeping these routines.

You seemed to be concerned about out of tree stuff.  If so, that's not
of our concern.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: 'Christoph Hellwig'
> Sent: 14 May 2020 11:35
> On Thu, May 14, 2020 at 10:26:41AM +, David Laight wrote:
> > From: Christoph Hellwig
> > > Only for those were we have users, and all those are covered.
> >
> > What do we tell all our users when our kernel SCTP code
> > no longer works?
> 
> We only care about in-tree modules, just like for every other interface
> in the kernel.

Even if our management agreed to release the code and the code
layout matched the kernel guidelines you still wouldn't want
two large drivers that implement telephony functionality
for hardware that very few people actually have.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread 'Christoph Hellwig'
On Thu, May 14, 2020 at 10:26:41AM +, David Laight wrote:
> From: Christoph Hellwig
> > Only for those were we have users, and all those are covered.
> 
> What do we tell all our users when our kernel SCTP code
> no longer works?

We only care about in-tree modules, just like for every other interface
in the kernel.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: Christoph Hellwig
> Only for those were we have users, and all those are covered.

What do we tell all our users when our kernel SCTP code
no longer works?

It uses SO_REUSADDR, SCTP_EVENTS, SCTP_NODELAY,
SCTP_STATUS, SCTP_INITMSG, IPV6_ONLY, SCTP_SOCKOPT_BINDX_ADD
and SO_LINGER.
We should probably use the CONNECTX function as well.

I doubt we are the one company with out-of-tree drivers
that use the kernel_socket interface.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread Christoph Hellwig
On Thu, May 14, 2020 at 08:29:30AM +, David Laight wrote:
> You need to export functions that do most of the socket options
> for all protocols.

Only for those were we have users, and all those are covered.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread David Laight
From: Joe Perches
> Sent: 13 May 2020 18:39
> On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote:
> > this series removes the kernel_setsockopt and kernel_getsockopt
> > functions, and instead switches their users to small functions that
> > implement setting (or in one case getting) a sockopt directly using
> > a normal kernel function call with type safety and all the other
> > benefits of not having a function call.
> >
> > In some cases these functions seem pretty heavy handed as they do
> > a lock_sock even for just setting a single variable, but this mirrors
> > the real setsockopt implementation - counter to that a few kernel
> > drivers just set the fields directly already.
> >
> > Nevertheless the diffstat looks quite promising:
> >
> >  42 files changed, 721 insertions(+), 799 deletions(-)

I missed this patch going through.
Massive NACK.

You need to export functions that do most of the socket options
for all protocols.
As well as REUSADDR and NODELAY SCTP has loads because a lot
of stuff that should have been extra system calls got piled
into setsockopt.

An alternate solution would be to move the copy_to/from_user()
into a wrapper function so that the kernel_[sg]etsockopt()
functions would bypass them completely.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)




Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-14 Thread Christoph Hellwig
On Wed, May 13, 2020 at 10:38:59AM -0700, Joe Perches wrote:
> It might be useful to show overall object size change.
> 
> More EXPORT_SYMBOL uses increase object size a little.
> 
> And not sure it matters much except it reduces overall object
> size, but these patches remove (unnecessary) logging on error
> and that could be mentioned in the cover letter too.

The intent here is not to reduce code size.  The intent is to kill of
set_fs users so that we can eventually remove set_fs entirely.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-13 Thread David Miller
From: Christoph Hellwig 
Date: Wed, 13 May 2020 08:26:15 +0200

> Hi Dave,
> 
> this series removes the kernel_setsockopt and kernel_getsockopt
> functions, and instead switches their users to small functions that
> implement setting (or in one case getting) a sockopt directly using
> a normal kernel function call with type safety and all the other
> benefits of not having a function call.
> 
> In some cases these functions seem pretty heavy handed as they do
> a lock_sock even for just setting a single variable, but this mirrors
> the real setsockopt implementation - counter to that a few kernel
> drivers just set the fields directly already.
> 
> Nevertheless the diffstat looks quite promising:
> 
>  42 files changed, 721 insertions(+), 799 deletions(-)

Overall I'm fine with these changes, but three things need to happen
before I can think about applying this:

1) Address David's feedback about the ip_mtu*() calls that can occur
   on ipv6 sockets too.

2) Handle the feedback about dlm now bringing in sctp even if sctp
   sockets are not even used because of the symbol dependency.

3) Add the rxrpc documentation requested by David.

Thank you.



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-13 Thread Sagi Grimberg




Hi Dave,

this series removes the kernel_setsockopt and kernel_getsockopt
functions, and instead switches their users to small functions that
implement setting (or in one case getting) a sockopt directly using
a normal kernel function call with type safety and all the other
benefits of not having a function call.

In some cases these functions seem pretty heavy handed as they do
a lock_sock even for just setting a single variable, but this mirrors
the real setsockopt implementation - counter to that a few kernel
drivers just set the fields directly already.

Nevertheless the diffstat looks quite promising:

  42 files changed, 721 insertions(+), 799 deletions(-)


For the nvme-tcp bits,

Acked-by: Sagi Grimberg 



Re: [Cluster-devel] remove kernel_setsockopt and kernel_getsockopt

2020-05-13 Thread Joe Perches
On Wed, 2020-05-13 at 08:26 +0200, Christoph Hellwig wrote:
> this series removes the kernel_setsockopt and kernel_getsockopt
> functions, and instead switches their users to small functions that
> implement setting (or in one case getting) a sockopt directly using
> a normal kernel function call with type safety and all the other
> benefits of not having a function call.
> 
> In some cases these functions seem pretty heavy handed as they do
> a lock_sock even for just setting a single variable, but this mirrors
> the real setsockopt implementation - counter to that a few kernel
> drivers just set the fields directly already.
> 
> Nevertheless the diffstat looks quite promising:
> 
>  42 files changed, 721 insertions(+), 799 deletions(-)

trivia:

It might be useful to show overall object size change.

More EXPORT_SYMBOL uses increase object size a little.

And not sure it matters much except it reduces overall object
size, but these patches remove (unnecessary) logging on error
and that could be mentioned in the cover letter too.

e.g.:

-   ret = kernel_setsockopt(queue->sock, SOL_SOCKET, SO_LINGER,
-   (char *), sizeof(sol));
-   if (ret) {
-   dev_err(nctrl->device,
-   "failed to set SO_LINGER sock opt %d\n", ret);
-   goto err_sock;
-   }
+   sock_set_linger(queue->sock->sk, true, 0);