Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-28 Thread Vlastimil Babka
On 09/28/2016 06:30 PM, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 27 September 2016 12:51
> ...
>> Process name suggests it's part of db2 database. It seems it has to implement
>> its own interface to select() syscall, because glibc itself seems to have a
>> FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all 
>> the
>> years...
> 
> ISTR the canonical way to increase the size being to set FD_SETSIZE
> to a larger value before including any of the headers.
> 
> Or doesn't that work with linux and glibc ??

Doesn't seem so.

> 
>   David
> 



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-28 Thread Vlastimil Babka
On 09/28/2016 06:30 PM, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 27 September 2016 12:51
> ...
>> Process name suggests it's part of db2 database. It seems it has to implement
>> its own interface to select() syscall, because glibc itself seems to have a
>> FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all 
>> the
>> years...
> 
> ISTR the canonical way to increase the size being to set FD_SETSIZE
> to a larger value before including any of the headers.
> 
> Or doesn't that work with linux and glibc ??

Doesn't seem so.

> 
>   David
> 



RE: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-28 Thread David Laight
From: Vlastimil Babka
> Sent: 27 September 2016 12:51
...
> Process name suggests it's part of db2 database. It seems it has to implement
> its own interface to select() syscall, because glibc itself seems to have a
> FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all 
> the
> years...

ISTR the canonical way to increase the size being to set FD_SETSIZE
to a larger value before including any of the headers.

Or doesn't that work with linux and glibc ??

David



RE: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-28 Thread David Laight
From: Vlastimil Babka
> Sent: 27 September 2016 12:51
...
> Process name suggests it's part of db2 database. It seems it has to implement
> its own interface to select() syscall, because glibc itself seems to have a
> FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all 
> the
> years...

ISTR the canonical way to increase the size being to set FD_SETSIZE
to a larger value before including any of the headers.

Or doesn't that work with linux and glibc ??

David



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/27/2016 01:42 PM, Nicholas Piggin wrote:

On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:


From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
>
>
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().


Right. Presumably it's some old app they're still using, no?


Process name suggests it's part of db2 database. It seems it has to implement 
its own interface to select() syscall, because glibc itself seems to have a 
FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all the 
years...





Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/27/2016 01:42 PM, Nicholas Piggin wrote:

On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:


From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
>
>
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().


Right. Presumably it's some old app they're still using, no?


Process name suggests it's part of db2 database. It seems it has to implement 
its own interface to select() syscall, because glibc itself seems to have a 
FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all the 
years...





Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 27 September 2016 12:25
> > On Tue, 27 Sep 2016 10:44:04 +0200
> > Vlastimil Babka  wrote:
> >   
> > > On 09/23/2016 06:47 PM, Jason Baron wrote:  
> > > > Hi,
> > > >
> > > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> > > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > > >> "Hillf Danton"  wrote:
> > > >>  
> > > 
> > >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where 
> > >  size grows
> > >  with the number of fds passed. We had a customer report page 
> > >  allocation
> > >  failures of order-4 for this allocation. This is a costly order, so 
> > >  it might
> > >  easily fail, as the VM expects such allocation to have a lower-order 
> > >  fallback.
> > > 
> > >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> > >  physically contiguous. Also the allocation is temporary for the 
> > >  duration of the
> > >  syscall, so it's unlikely to stress vmalloc too much.
> > > 
> > >  Note that the poll(2) syscall seems to use a linked list of order-0 
> > >  pages, so
> > >  it doesn't need this kind of fallback.  
> > > >>
> > > >> How about something like this? (untested)  
> > >
> > > This pushes the limit further, but might just delay the problem. Could be 
> > > an
> > > optimization on top if there's enough interest, though.  
> > 
> > What's your customer doing with those selects? If they care at all about
> > performance, I doubt they want select to attempt order-4 allocations, fail,
> > then use vmalloc :)  
> 
> If they care about performance they shouldn't be passing select() lists that
> are anywhere near that large.
> If the number of actual fd is small - use poll().

Right. Presumably it's some old app they're still using, no?


Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 27 September 2016 12:25
> > On Tue, 27 Sep 2016 10:44:04 +0200
> > Vlastimil Babka  wrote:
> >   
> > > On 09/23/2016 06:47 PM, Jason Baron wrote:  
> > > > Hi,
> > > >
> > > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> > > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > > >> "Hillf Danton"  wrote:
> > > >>  
> > > 
> > >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where 
> > >  size grows
> > >  with the number of fds passed. We had a customer report page 
> > >  allocation
> > >  failures of order-4 for this allocation. This is a costly order, so 
> > >  it might
> > >  easily fail, as the VM expects such allocation to have a lower-order 
> > >  fallback.
> > > 
> > >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> > >  physically contiguous. Also the allocation is temporary for the 
> > >  duration of the
> > >  syscall, so it's unlikely to stress vmalloc too much.
> > > 
> > >  Note that the poll(2) syscall seems to use a linked list of order-0 
> > >  pages, so
> > >  it doesn't need this kind of fallback.  
> > > >>
> > > >> How about something like this? (untested)  
> > >
> > > This pushes the limit further, but might just delay the problem. Could be 
> > > an
> > > optimization on top if there's enough interest, though.  
> > 
> > What's your customer doing with those selects? If they care at all about
> > performance, I doubt they want select to attempt order-4 allocations, fail,
> > then use vmalloc :)  
> 
> If they care about performance they shouldn't be passing select() lists that
> are anywhere near that large.
> If the number of actual fd is small - use poll().

Right. Presumably it's some old app they're still using, no?


RE: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread David Laight
From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
> 
> > On 09/23/2016 06:47 PM, Jason Baron wrote:
> > > Hi,
> > >
> > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:
> > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > >> "Hillf Danton"  wrote:
> > >>
> > 
> >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
> >  grows
> >  with the number of fds passed. We had a customer report page allocation
> >  failures of order-4 for this allocation. This is a costly order, so it 
> >  might
> >  easily fail, as the VM expects such allocation to have a lower-order 
> >  fallback.
> > 
> >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> >  physically contiguous. Also the allocation is temporary for the 
> >  duration of the
> >  syscall, so it's unlikely to stress vmalloc too much.
> > 
> >  Note that the poll(2) syscall seems to use a linked list of order-0 
> >  pages, so
> >  it doesn't need this kind of fallback.
> > >>
> > >> How about something like this? (untested)
> >
> > This pushes the limit further, but might just delay the problem. Could be an
> > optimization on top if there's enough interest, though.
> 
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().

Otherwise you want one of the 'event' mechanisms in order to avoid setting
the markers on every fd after every event (can't remember how you do that
in Linux).

At least this isn't SYSV - poll() was O(n^2) in the number of fd
(because the fd were on a linked list).

David



RE: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread David Laight
From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
> 
> > On 09/23/2016 06:47 PM, Jason Baron wrote:
> > > Hi,
> > >
> > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:
> > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > >> "Hillf Danton"  wrote:
> > >>
> > 
> >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
> >  grows
> >  with the number of fds passed. We had a customer report page allocation
> >  failures of order-4 for this allocation. This is a costly order, so it 
> >  might
> >  easily fail, as the VM expects such allocation to have a lower-order 
> >  fallback.
> > 
> >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> >  physically contiguous. Also the allocation is temporary for the 
> >  duration of the
> >  syscall, so it's unlikely to stress vmalloc too much.
> > 
> >  Note that the poll(2) syscall seems to use a linked list of order-0 
> >  pages, so
> >  it doesn't need this kind of fallback.
> > >>
> > >> How about something like this? (untested)
> >
> > This pushes the limit further, but might just delay the problem. Could be an
> > optimization on top if there's enough interest, though.
> 
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().

Otherwise you want one of the 'event' mechanisms in order to avoid setting
the markers on every fd after every event (can't remember how you do that
in Linux).

At least this isn't SYSV - poll() was O(n^2) in the number of fd
(because the fd were on a linked list).

David



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 10:44:04 +0200
Vlastimil Babka  wrote:

> On 09/23/2016 06:47 PM, Jason Baron wrote:
> > Hi,
> >
> > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> >> On Fri, 23 Sep 2016 14:42:53 +0800
> >> "Hillf Danton"  wrote:
> >>  
> 
>  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
>  grows
>  with the number of fds passed. We had a customer report page allocation
>  failures of order-4 for this allocation. This is a costly order, so it 
>  might
>  easily fail, as the VM expects such allocation to have a lower-order 
>  fallback.
> 
>  Such trivial fallback is vmalloc(), as the memory doesn't have to be
>  physically contiguous. Also the allocation is temporary for the duration 
>  of the
>  syscall, so it's unlikely to stress vmalloc too much.
> 
>  Note that the poll(2) syscall seems to use a linked list of order-0 
>  pages, so
>  it doesn't need this kind of fallback.  
> >>
> >> How about something like this? (untested)  
> 
> This pushes the limit further, but might just delay the problem. Could be an 
> optimization on top if there's enough interest, though.

What's your customer doing with those selects? If they care at all about
performance, I doubt they want select to attempt order-4 allocations, fail,
then use vmalloc :)



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 10:44:04 +0200
Vlastimil Babka  wrote:

> On 09/23/2016 06:47 PM, Jason Baron wrote:
> > Hi,
> >
> > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> >> On Fri, 23 Sep 2016 14:42:53 +0800
> >> "Hillf Danton"  wrote:
> >>  
> 
>  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
>  grows
>  with the number of fds passed. We had a customer report page allocation
>  failures of order-4 for this allocation. This is a costly order, so it 
>  might
>  easily fail, as the VM expects such allocation to have a lower-order 
>  fallback.
> 
>  Such trivial fallback is vmalloc(), as the memory doesn't have to be
>  physically contiguous. Also the allocation is temporary for the duration 
>  of the
>  syscall, so it's unlikely to stress vmalloc too much.
> 
>  Note that the poll(2) syscall seems to use a linked list of order-0 
>  pages, so
>  it doesn't need this kind of fallback.  
> >>
> >> How about something like this? (untested)  
> 
> This pushes the limit further, but might just delay the problem. Could be an 
> optimization on top if there's enough interest, though.

What's your customer doing with those selects? If they care at all about
performance, I doubt they want select to attempt order-4 allocations, fail,
then use vmalloc :)



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/23/2016 06:47 PM, Jason Baron wrote:

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:

On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:



The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.


How about something like this? (untested)


This pushes the limit further, but might just delay the problem. Could be an 
optimization on top if there's enough interest, though.


[...]


+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)


The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by
checking by if the user fd_set pointers are NULL? That can avoid failures :)


That would be a more major rewrite, as the core algorithm doesn't expect NULLs.


Thanks,

-Jason





Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/23/2016 06:47 PM, Jason Baron wrote:

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:

On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:



The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.


How about something like this? (untested)


This pushes the limit further, but might just delay the problem. Could be an 
optimization on top if there's enough interest, though.


[...]


+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)


The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by
checking by if the user fd_set pointers are NULL? That can avoid failures :)


That would be a more major rewrite, as the core algorithm doesn't expect NULLs.


Thanks,

-Jason





Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Jason Baron

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:

On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:



The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.


How about something like this? (untested)

Eric isn't wrong about vmalloc sucking :)

Thanks,
Nick


---
  fs/select.c | 57 +++--
  1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8ed9da5..3b4834c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
void *bits;
int ret, max_fds;
unsigned int size;
+   size_t nr_bytes;
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
@@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
 * since we used fdset we need to allocate memory in units of
 * long-words.
 */
-   size = FDS_BYTES(n);
+   ret = -ENOMEM;
bits = stack_fds;
-   if (size > sizeof(stack_fds) / 6) {
-   /* Not enough space in on-stack array; must use kmalloc */
+   size = FDS_BYTES(n);
+   nr_bytes = 6 * size;
+
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   /* Avoid multi-page allocation if possible */
ret = -ENOMEM;
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   fds.in = kmalloc(size, GFP_KERNEL);
+   fds.out = kmalloc(size, GFP_KERNEL);
+   fds.ex = kmalloc(size, GFP_KERNEL);
+   fds.res_in = kmalloc(size, GFP_KERNEL);
+   fds.res_out = kmalloc(size, GFP_KERNEL);
+   fds.res_ex = kmalloc(size, GFP_KERNEL);
+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)


The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by 
checking by if the user fd_set pointers are NULL? That can avoid failures :)


Thanks,

-Jason


Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Jason Baron

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:

On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:



The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.


How about something like this? (untested)

Eric isn't wrong about vmalloc sucking :)

Thanks,
Nick


---
  fs/select.c | 57 +++--
  1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8ed9da5..3b4834c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
void *bits;
int ret, max_fds;
unsigned int size;
+   size_t nr_bytes;
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
@@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
 * since we used fdset we need to allocate memory in units of
 * long-words.
 */
-   size = FDS_BYTES(n);
+   ret = -ENOMEM;
bits = stack_fds;
-   if (size > sizeof(stack_fds) / 6) {
-   /* Not enough space in on-stack array; must use kmalloc */
+   size = FDS_BYTES(n);
+   nr_bytes = 6 * size;
+
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   /* Avoid multi-page allocation if possible */
ret = -ENOMEM;
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   fds.in = kmalloc(size, GFP_KERNEL);
+   fds.out = kmalloc(size, GFP_KERNEL);
+   fds.ex = kmalloc(size, GFP_KERNEL);
+   fds.res_in = kmalloc(size, GFP_KERNEL);
+   fds.res_out = kmalloc(size, GFP_KERNEL);
+   fds.res_ex = kmalloc(size, GFP_KERNEL);
+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)


The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by 
checking by if the user fd_set pointers are NULL? That can avoid failures :)


Thanks,

-Jason


Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Nicholas Piggin
On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:

> > 
> > The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> > with the number of fds passed. We had a customer report page allocation
> > failures of order-4 for this allocation. This is a costly order, so it might
> > easily fail, as the VM expects such allocation to have a lower-order 
> > fallback.
> > 
> > Such trivial fallback is vmalloc(), as the memory doesn't have to be
> > physically contiguous. Also the allocation is temporary for the duration of 
> > the
> > syscall, so it's unlikely to stress vmalloc too much.
> > 
> > Note that the poll(2) syscall seems to use a linked list of order-0 pages, 
> > so
> > it doesn't need this kind of fallback.

How about something like this? (untested)

Eric isn't wrong about vmalloc sucking :)

Thanks,
Nick


---
 fs/select.c | 57 +++--
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8ed9da5..3b4834c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
void *bits;
int ret, max_fds;
unsigned int size;
+   size_t nr_bytes;
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
@@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
 * since we used fdset we need to allocate memory in units of
 * long-words. 
 */
-   size = FDS_BYTES(n);
+   ret = -ENOMEM;
bits = stack_fds;
-   if (size > sizeof(stack_fds) / 6) {
-   /* Not enough space in on-stack array; must use kmalloc */
+   size = FDS_BYTES(n);
+   nr_bytes = 6 * size;
+
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   /* Avoid multi-page allocation if possible */
ret = -ENOMEM;
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   fds.in = kmalloc(size, GFP_KERNEL);
+   fds.out = kmalloc(size, GFP_KERNEL);
+   fds.ex = kmalloc(size, GFP_KERNEL);
+   fds.res_in = kmalloc(size, GFP_KERNEL);
+   fds.res_out = kmalloc(size, GFP_KERNEL);
+   fds.res_ex = kmalloc(size, GFP_KERNEL);
+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)
+   bits = kmalloc(nr_bytes, GFP_KERNEL);
+   if (!bits)
+   goto out_nofds;
+   }
+   fds.in  = bits;
+   fds.out = bits +   size;
+   fds.ex  = bits + 2*size;
+   fds.res_in  = bits + 3*size;
+   fds.res_out = bits + 4*size;
+   fds.res_ex  = bits + 5*size;
}
-   fds.in  = bits;
-   fds.out = bits +   size;
-   fds.ex  = bits + 2*size;
-   fds.res_in  = bits + 3*size;
-   fds.res_out = bits + 4*size;
-   fds.res_ex  = bits + 5*size;
 
if ((ret = get_fd_set(n, inp, fds.in)) ||
(ret = get_fd_set(n, outp, fds.out)) ||
@@ -617,8 +636,18 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
ret = -EFAULT;
 
 out:
-   if (bits != stack_fds)
-   kfree(bits);
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   kfree(fds.in);
+   kfree(fds.out);
+   kfree(fds.ex);
+   kfree(fds.res_in);
+   kfree(fds.res_out);
+   kfree(fds.res_ex);
+   } else {
+   if (bits != stack_fds)
+   kfree(bits);
+   }
+
 out_nofds:
return ret;
 }
-- 
2.9.3



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Nicholas Piggin
On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:

> > 
> > The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> > with the number of fds passed. We had a customer report page allocation
> > failures of order-4 for this allocation. This is a costly order, so it might
> > easily fail, as the VM expects such allocation to have a lower-order 
> > fallback.
> > 
> > Such trivial fallback is vmalloc(), as the memory doesn't have to be
> > physically contiguous. Also the allocation is temporary for the duration of 
> > the
> > syscall, so it's unlikely to stress vmalloc too much.
> > 
> > Note that the poll(2) syscall seems to use a linked list of order-0 pages, 
> > so
> > it doesn't need this kind of fallback.

How about something like this? (untested)

Eric isn't wrong about vmalloc sucking :)

Thanks,
Nick


---
 fs/select.c | 57 +++--
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8ed9da5..3b4834c 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
void *bits;
int ret, max_fds;
unsigned int size;
+   size_t nr_bytes;
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
@@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
 * since we used fdset we need to allocate memory in units of
 * long-words. 
 */
-   size = FDS_BYTES(n);
+   ret = -ENOMEM;
bits = stack_fds;
-   if (size > sizeof(stack_fds) / 6) {
-   /* Not enough space in on-stack array; must use kmalloc */
+   size = FDS_BYTES(n);
+   nr_bytes = 6 * size;
+
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   /* Avoid multi-page allocation if possible */
ret = -ENOMEM;
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   fds.in = kmalloc(size, GFP_KERNEL);
+   fds.out = kmalloc(size, GFP_KERNEL);
+   fds.ex = kmalloc(size, GFP_KERNEL);
+   fds.res_in = kmalloc(size, GFP_KERNEL);
+   fds.res_out = kmalloc(size, GFP_KERNEL);
+   fds.res_ex = kmalloc(size, GFP_KERNEL);
+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)
+   bits = kmalloc(nr_bytes, GFP_KERNEL);
+   if (!bits)
+   goto out_nofds;
+   }
+   fds.in  = bits;
+   fds.out = bits +   size;
+   fds.ex  = bits + 2*size;
+   fds.res_in  = bits + 3*size;
+   fds.res_out = bits + 4*size;
+   fds.res_ex  = bits + 5*size;
}
-   fds.in  = bits;
-   fds.out = bits +   size;
-   fds.ex  = bits + 2*size;
-   fds.res_in  = bits + 3*size;
-   fds.res_out = bits + 4*size;
-   fds.res_ex  = bits + 5*size;
 
if ((ret = get_fd_set(n, inp, fds.in)) ||
(ret = get_fd_set(n, outp, fds.out)) ||
@@ -617,8 +636,18 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
ret = -EFAULT;
 
 out:
-   if (bits != stack_fds)
-   kfree(bits);
+   if (unlikely(nr_bytes > PAGE_SIZE)) {
+   kfree(fds.in);
+   kfree(fds.out);
+   kfree(fds.ex);
+   kfree(fds.res_in);
+   kfree(fds.res_out);
+   kfree(fds.res_ex);
+   } else {
+   if (bits != stack_fds)
+   kfree(bits);
+   }
+
 out_nofds:
return ret;
 }
-- 
2.9.3



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Hillf Danton
> 
> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> with the number of fds passed. We had a customer report page allocation
> failures of order-4 for this allocation. This is a costly order, so it might
> easily fail, as the VM expects such allocation to have a lower-order fallback.
> 
> Such trivial fallback is vmalloc(), as the memory doesn't have to be
> physically contiguous. Also the allocation is temporary for the duration of 
> the
> syscall, so it's unlikely to stress vmalloc too much.
> 
> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
> it doesn't need this kind of fallback.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  fs/select.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da50896a..8fe5bddbe99b 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   struct fdtable *fdt;
>   /* Allocate small arguments on the stack to save memory and be faster */
>   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> + unsigned long alloc_size;
> 
>   ret = -EINVAL;
>   if (n < 0)
> @@ -580,10 +582,15 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   bits = stack_fds;
>   if (size > sizeof(stack_fds) / 6) {
>   /* Not enough space in on-stack array; must use kmalloc */
> + alloc_size = 6 * size;
>   ret = -ENOMEM;
> - bits = kmalloc(6 * size, GFP_KERNEL);
> - if (!bits)
> - goto out_nofds;
> + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
> + if (!bits && alloc_size > PAGE_SIZE) {
> + bits = vmalloc(alloc_size);
> +
> + if (!bits)
> + goto out_nofds;
> + }

Looks like we also have to bail out if kmalloc fails with 
alloc_size less than PAGE_SIZE.

thanks
Hillf
>   }
>   fds.in  = bits;
>   fds.out = bits +   size;
> @@ -618,7 +625,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
> 
>  out:
>   if (bits != stack_fds)
> - kfree(bits);
> + kvfree(bits);
>  out_nofds:
>   return ret;
>  }
> --
> 2.10.0



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-23 Thread Hillf Danton
> 
> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> with the number of fds passed. We had a customer report page allocation
> failures of order-4 for this allocation. This is a costly order, so it might
> easily fail, as the VM expects such allocation to have a lower-order fallback.
> 
> Such trivial fallback is vmalloc(), as the memory doesn't have to be
> physically contiguous. Also the allocation is temporary for the duration of 
> the
> syscall, so it's unlikely to stress vmalloc too much.
> 
> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
> it doesn't need this kind of fallback.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  fs/select.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da50896a..8fe5bddbe99b 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   struct fdtable *fdt;
>   /* Allocate small arguments on the stack to save memory and be faster */
>   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> + unsigned long alloc_size;
> 
>   ret = -EINVAL;
>   if (n < 0)
> @@ -580,10 +582,15 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   bits = stack_fds;
>   if (size > sizeof(stack_fds) / 6) {
>   /* Not enough space in on-stack array; must use kmalloc */
> + alloc_size = 6 * size;
>   ret = -ENOMEM;
> - bits = kmalloc(6 * size, GFP_KERNEL);
> - if (!bits)
> - goto out_nofds;
> + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
> + if (!bits && alloc_size > PAGE_SIZE) {
> + bits = vmalloc(alloc_size);
> +
> + if (!bits)
> + goto out_nofds;
> + }

Looks like we also have to bail out if kmalloc fails with 
alloc_size less than PAGE_SIZE.

thanks
Hillf
>   }
>   fds.in  = bits;
>   fds.out = bits +   size;
> @@ -618,7 +625,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
> 
>  out:
>   if (bits != stack_fds)
> - kfree(bits);
> + kvfree(bits);
>  out_nofds:
>   return ret;
>  }
> --
> 2.10.0



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-22 Thread Vlastimil Babka

On 09/22/2016 06:24 PM, Eric Dumazet wrote:


+   bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
+   if (!bits && alloc_size > PAGE_SIZE) {
+   bits = vmalloc(alloc_size);
+
+   if (!bits)
+   goto out_nofds;


Test should happen if alloc_size <= PAGE_SIZE


+   }


if (!bits && alloc_size > PAGE_SIZE)
bits = vmalloc(alloc_size);

if (!bits)
  goto out_nofds;



Thanks... stupid last-minute changes.




Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-22 Thread Vlastimil Babka

On 09/22/2016 06:24 PM, Eric Dumazet wrote:


+   bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
+   if (!bits && alloc_size > PAGE_SIZE) {
+   bits = vmalloc(alloc_size);
+
+   if (!bits)
+   goto out_nofds;


Test should happen if alloc_size <= PAGE_SIZE


+   }


if (!bits && alloc_size > PAGE_SIZE)
bits = vmalloc(alloc_size);

if (!bits)
  goto out_nofds;



Thanks... stupid last-minute changes.




Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 17:28 +0200, Vlastimil Babka wrote:
> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> with the number of fds passed. We had a customer report page allocation
> failures of order-4 for this allocation. This is a costly order, so it might
> easily fail, as the VM expects such allocation to have a lower-order fallback.
> 
> Such trivial fallback is vmalloc(), as the memory doesn't have to be
> physically contiguous. Also the allocation is temporary for the duration of 
> the
> syscall, so it's unlikely to stress vmalloc too much.
> 
> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
> it doesn't need this kind of fallback.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  fs/select.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da50896a..8fe5bddbe99b 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   struct fdtable *fdt;
>   /* Allocate small arguments on the stack to save memory and be faster */
>   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> + unsigned long alloc_size;
>  
>   ret = -EINVAL;
>   if (n < 0)
> @@ -580,10 +582,15 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   bits = stack_fds;
>   if (size > sizeof(stack_fds) / 6) {
>   /* Not enough space in on-stack array; must use kmalloc */
> + alloc_size = 6 * size;
>   ret = -ENOMEM;
> - bits = kmalloc(6 * size, GFP_KERNEL);
> - if (!bits)
> - goto out_nofds;
> + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
> + if (!bits && alloc_size > PAGE_SIZE) {
> + bits = vmalloc(alloc_size);
> +
> + if (!bits)
> + goto out_nofds;

Test should happen if alloc_size <= PAGE_SIZE

> + }

if (!bits && alloc_size > PAGE_SIZE)
bits = vmalloc(alloc_size);

if (!bits)
  goto out_nofds;



>   }
>   fds.in  = bits;
>   fds.out = bits +   size;
> @@ -618,7 +625,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>  
>  out:
>   if (bits != stack_fds)
> - kfree(bits);
> + kvfree(bits);
>  out_nofds:
>   return ret;
>  }




Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 17:28 +0200, Vlastimil Babka wrote:
> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> with the number of fds passed. We had a customer report page allocation
> failures of order-4 for this allocation. This is a costly order, so it might
> easily fail, as the VM expects such allocation to have a lower-order fallback.
> 
> Such trivial fallback is vmalloc(), as the memory doesn't have to be
> physically contiguous. Also the allocation is temporary for the duration of 
> the
> syscall, so it's unlikely to stress vmalloc too much.
> 
> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
> it doesn't need this kind of fallback.
> 
> Signed-off-by: Vlastimil Babka 
> ---
>  fs/select.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da50896a..8fe5bddbe99b 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   struct fdtable *fdt;
>   /* Allocate small arguments on the stack to save memory and be faster */
>   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> + unsigned long alloc_size;
>  
>   ret = -EINVAL;
>   if (n < 0)
> @@ -580,10 +582,15 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   bits = stack_fds;
>   if (size > sizeof(stack_fds) / 6) {
>   /* Not enough space in on-stack array; must use kmalloc */
> + alloc_size = 6 * size;
>   ret = -ENOMEM;
> - bits = kmalloc(6 * size, GFP_KERNEL);
> - if (!bits)
> - goto out_nofds;
> + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
> + if (!bits && alloc_size > PAGE_SIZE) {
> + bits = vmalloc(alloc_size);
> +
> + if (!bits)
> + goto out_nofds;

Test should happen if alloc_size <= PAGE_SIZE

> + }

if (!bits && alloc_size > PAGE_SIZE)
bits = vmalloc(alloc_size);

if (!bits)
  goto out_nofds;



>   }
>   fds.in  = bits;
>   fds.out = bits +   size;
> @@ -618,7 +625,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>  
>  out:
>   if (bits != stack_fds)
> - kfree(bits);
> + kvfree(bits);
>  out_nofds:
>   return ret;
>  }