Re: pselect(2) bug or feature?

2020-04-01 Thread Christos Zoulas
The select(2) family of syscalls was designed to work this way (block when
file descriptors are not available, even when they cannot become available).
For example:

select(0, NULL, NULL, NULL, NULL);

blocks forever.

In fact in the very old days (before usleep, nanosleep, clock_nanosleep),
select(2) was the only way to get sub-second sleeps from the kernel, and
the idiom was similar to the above:

void
usleep(int usec) {
struct timeval tv;

tv.tv_sec = 0;
tv.tv_usec = usec;

select(0, NULL, NULL, NULL, );
}

Best,

christos

> On Apr 1, 2020, at 8:19 AM, Theo de Raadt  wrote:
> 
> The regress test is wrong.
> 
> The first parameter to pselect has to tell it which bits to look at.
> It should be something like getdtablesize(), or fd+1, or FD_SETSIZE.
> 
> And then there's all the handwaving about fd_set overflow, meaning what
> if fd >= FD_SETSIZE.  I've never heard of an exploitable one  though.
> 
> Martin Pieuchot  wrote:
> 
>> While working on moving pselect(2) outside of the KERNEL_LOCK() I found
>> this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c
>> regression test.  Should it be considered as a bug or feature?
>> 
>> The above mentioned test does the following (simplified):
>> 
>>  fd = open("/dev/null", O_RDONLY);
>>  FD_ZERO();
>>  FD_SET(fd, );
>>  pselect(1, , NULL, NULL, NULL, );
>> 
>> Note that in most environments, open(2) will return a value higher than
>> 1 which means the value written by FD_SET() will never be checked by the
>> current implementation of {p,}select(2).  In this case, even if it isn't
>> waiting for any condition, the current implementation will block.
>> 
>> To put it differently, the code above is an obfuscated way to write:
>> 
>>  sigsuspend();
>> 
>> If we agree that this is not what the user wants to do, I'm suggesting
>> the diff below which makes {p,}select(2) return EINVAL if no bit has
>> been found in the given set up to `nfds'.
>> 
>> Independently from our decision, I'd suggest to rename the variable
>> name `nfds' in the syscall documentation because it can lead to this
>> kind of confusion.
>> 
>> Finally if we accept this change the regression test will have to be
>> adapted because polling "/dev/null" for reading always return true so
>> pselect(2) wont block waiting for a signal.  Any suggestion?
>> 
>> Index: sys/kern/sys_generic.c
>> ===
>> RCS file: /cvs/src/sys/kern/sys_generic.c,v
>> retrieving revision 1.131
>> diff -u -p -r1.131 sys_generic.c
>> --- sys/kern/sys_generic.c   20 Mar 2020 04:11:05 -  1.131
>> +++ sys/kern/sys_generic.c   1 Apr 2020 10:27:51 -
>> @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f
>> {
>>  caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits;
>>  struct filedesc *fdp = p->p_fd;
>> -int msk, i, j, fd;
>> +int msk, i, j, fd, foundfds = 0;
>>  fd_mask bits;
>>  struct file *fp;
>>  int n = 0;
>> @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f
>>  bits &= ~(1 << j);
>>  if ((fp = fd_getfile(fdp, fd)) == NULL)
>>  return (EBADF);
>> +foundfds++;
>>  if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) {
>>  FD_SET(fd, pobits);
>>  n++;
>> @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f
>>  }
>>  }
>>  }
>> +if (foundfds == 0)
>> +return (EINVAL);
>>  *retval = n;
>>  return (0);
>> }
>> Index: lib/libc/sys/select.2
>> ===
>> RCS file: /cvs/src/lib/libc/sys/select.2,v
>> retrieving revision 1.42
>> diff -u -p -r1.42 select.2
>> --- lib/libc/sys/select.217 Sep 2016 01:01:42 -  1.42
>> +++ lib/libc/sys/select.21 Apr 2020 10:35:29 -
>> @@ -188,7 +188,7 @@ The specified time limit is invalid.
>> One of its components is negative or too large.
>> .It Bq Er EINVAL
>> .Fa nfds
>> -was less than 0.
>> +was smaller than the smallest descriptor specified in the sets.
>> .El
>> .Sh SEE ALSO
>> .Xr accept 2 ,
>> 



signature.asc
Description: Message signed with OpenPGP


Re: pselect(2) bug or feature?

2020-04-01 Thread Todd C . Miller
On Wed, 01 Apr 2020 13:53:53 +0200, Claudio Jeker wrote:

> I think this is actually a feature. The standard does not mention that you
> have to listen on something and it would allow to use select() as a
> sleep() replacement. I would expect that pselect() should behave the same
> way.

I agree that this is a feature.  POSIX also specifies EINVAL if
nfds is larger than FD_SETSIZE but we support arrays of fd_set
larger than FD_SETSIZE so that is not an error in our implementation.

 - todd



Re: pselect(2) bug or feature?

2020-04-01 Thread Theo de Raadt
The regress test is wrong.

The first parameter to pselect has to tell it which bits to look at.
It should be something like getdtablesize(), or fd+1, or FD_SETSIZE.

And then there's all the handwaving about fd_set overflow, meaning what
if fd >= FD_SETSIZE.  I've never heard of an exploitable one  though.

Martin Pieuchot  wrote:

> While working on moving pselect(2) outside of the KERNEL_LOCK() I found
> this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c
> regression test.  Should it be considered as a bug or feature?
> 
> The above mentioned test does the following (simplified):
> 
>   fd = open("/dev/null", O_RDONLY);
>   FD_ZERO();
>   FD_SET(fd, );
>   pselect(1, , NULL, NULL, NULL, );
> 
> Note that in most environments, open(2) will return a value higher than
> 1 which means the value written by FD_SET() will never be checked by the
> current implementation of {p,}select(2).  In this case, even if it isn't
> waiting for any condition, the current implementation will block.
> 
> To put it differently, the code above is an obfuscated way to write:
> 
>   sigsuspend();
> 
> If we agree that this is not what the user wants to do, I'm suggesting
> the diff below which makes {p,}select(2) return EINVAL if no bit has
> been found in the given set up to `nfds'.
> 
> Independently from our decision, I'd suggest to rename the variable
> name `nfds' in the syscall documentation because it can lead to this
> kind of confusion.
> 
> Finally if we accept this change the regression test will have to be
> adapted because polling "/dev/null" for reading always return true so
> pselect(2) wont block waiting for a signal.  Any suggestion?
> 
> Index: sys/kern/sys_generic.c
> ===
> RCS file: /cvs/src/sys/kern/sys_generic.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 sys_generic.c
> --- sys/kern/sys_generic.c20 Mar 2020 04:11:05 -  1.131
> +++ sys/kern/sys_generic.c1 Apr 2020 10:27:51 -
> @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f
>  {
>   caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits;
>   struct filedesc *fdp = p->p_fd;
> - int msk, i, j, fd;
> + int msk, i, j, fd, foundfds = 0;
>   fd_mask bits;
>   struct file *fp;
>   int n = 0;
> @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f
>   bits &= ~(1 << j);
>   if ((fp = fd_getfile(fdp, fd)) == NULL)
>   return (EBADF);
> + foundfds++;
>   if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) {
>   FD_SET(fd, pobits);
>   n++;
> @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f
>   }
>   }
>   }
> + if (foundfds == 0)
> + return (EINVAL);
>   *retval = n;
>   return (0);
>  }
> Index: lib/libc/sys/select.2
> ===
> RCS file: /cvs/src/lib/libc/sys/select.2,v
> retrieving revision 1.42
> diff -u -p -r1.42 select.2
> --- lib/libc/sys/select.2 17 Sep 2016 01:01:42 -  1.42
> +++ lib/libc/sys/select.2 1 Apr 2020 10:35:29 -
> @@ -188,7 +188,7 @@ The specified time limit is invalid.
>  One of its components is negative or too large.
>  .It Bq Er EINVAL
>  .Fa nfds
> -was less than 0.
> +was smaller than the smallest descriptor specified in the sets.
>  .El
>  .Sh SEE ALSO
>  .Xr accept 2 ,
> 



Re: pselect(2) bug or feature?

2020-04-01 Thread Claudio Jeker
On Wed, Apr 01, 2020 at 01:08:10PM +0200, Martin Pieuchot wrote:
> While working on moving pselect(2) outside of the KERNEL_LOCK() I found
> this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c
> regression test.  Should it be considered as a bug or feature?

I think this is actually a feature. The standard does not mention that you
have to listen on something and it would allow to use select() as a
sleep() replacement. I would expect that pselect() should behave the same
way.

poll() does the same thing (you are allowed to pass a 0 poll fds).

> The above mentioned test does the following (simplified):
> 
>   fd = open("/dev/null", O_RDONLY);
>   FD_ZERO();
>   FD_SET(fd, );
>   pselect(1, , NULL, NULL, NULL, );
> 
> Note that in most environments, open(2) will return a value higher than
> 1 which means the value written by FD_SET() will never be checked by the
> current implementation of {p,}select(2).  In this case, even if it isn't
> waiting for any condition, the current implementation will block.
> 
> To put it differently, the code above is an obfuscated way to write:
> 
>   sigsuspend();
> 
> If we agree that this is not what the user wants to do, I'm suggesting
> the diff below which makes {p,}select(2) return EINVAL if no bit has
> been found in the given set up to `nfds'.
> 
> Independently from our decision, I'd suggest to rename the variable
> name `nfds' in the syscall documentation because it can lead to this
> kind of confusion.
> 
> Finally if we accept this change the regression test will have to be
> adapted because polling "/dev/null" for reading always return true so
> pselect(2) wont block waiting for a signal.  Any suggestion?
> 
> Index: sys/kern/sys_generic.c
> ===
> RCS file: /cvs/src/sys/kern/sys_generic.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 sys_generic.c
> --- sys/kern/sys_generic.c20 Mar 2020 04:11:05 -  1.131
> +++ sys/kern/sys_generic.c1 Apr 2020 10:27:51 -
> @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f
>  {
>   caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits;
>   struct filedesc *fdp = p->p_fd;
> - int msk, i, j, fd;
> + int msk, i, j, fd, foundfds = 0;
>   fd_mask bits;
>   struct file *fp;
>   int n = 0;
> @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f
>   bits &= ~(1 << j);
>   if ((fp = fd_getfile(fdp, fd)) == NULL)
>   return (EBADF);
> + foundfds++;
>   if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) {
>   FD_SET(fd, pobits);
>   n++;
> @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f
>   }
>   }
>   }
> + if (foundfds == 0)
> + return (EINVAL);
>   *retval = n;
>   return (0);
>  }
> Index: lib/libc/sys/select.2
> ===
> RCS file: /cvs/src/lib/libc/sys/select.2,v
> retrieving revision 1.42
> diff -u -p -r1.42 select.2
> --- lib/libc/sys/select.2 17 Sep 2016 01:01:42 -  1.42
> +++ lib/libc/sys/select.2 1 Apr 2020 10:35:29 -
> @@ -188,7 +188,7 @@ The specified time limit is invalid.
>  One of its components is negative or too large.
>  .It Bq Er EINVAL
>  .Fa nfds
> -was less than 0.
> +was smaller than the smallest descriptor specified in the sets.
>  .El
>  .Sh SEE ALSO
>  .Xr accept 2 ,
> 

-- 
:wq Claudio



pselect(2) bug or feature?

2020-04-01 Thread Martin Pieuchot
While working on moving pselect(2) outside of the KERNEL_LOCK() I found
this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c
regression test.  Should it be considered as a bug or feature?

The above mentioned test does the following (simplified):

fd = open("/dev/null", O_RDONLY);
FD_ZERO();
FD_SET(fd, );
pselect(1, , NULL, NULL, NULL, );

Note that in most environments, open(2) will return a value higher than
1 which means the value written by FD_SET() will never be checked by the
current implementation of {p,}select(2).  In this case, even if it isn't
waiting for any condition, the current implementation will block.

To put it differently, the code above is an obfuscated way to write:

sigsuspend();

If we agree that this is not what the user wants to do, I'm suggesting
the diff below which makes {p,}select(2) return EINVAL if no bit has
been found in the given set up to `nfds'.

Independently from our decision, I'd suggest to rename the variable
name `nfds' in the syscall documentation because it can lead to this
kind of confusion.

Finally if we accept this change the regression test will have to be
adapted because polling "/dev/null" for reading always return true so
pselect(2) wont block waiting for a signal.  Any suggestion?

Index: sys/kern/sys_generic.c
===
RCS file: /cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.131
diff -u -p -r1.131 sys_generic.c
--- sys/kern/sys_generic.c  20 Mar 2020 04:11:05 -  1.131
+++ sys/kern/sys_generic.c  1 Apr 2020 10:27:51 -
@@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f
 {
caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits;
struct filedesc *fdp = p->p_fd;
-   int msk, i, j, fd;
+   int msk, i, j, fd, foundfds = 0;
fd_mask bits;
struct file *fp;
int n = 0;
@@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f
bits &= ~(1 << j);
if ((fp = fd_getfile(fdp, fd)) == NULL)
return (EBADF);
+   foundfds++;
if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) {
FD_SET(fd, pobits);
n++;
@@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f
}
}
}
+   if (foundfds == 0)
+   return (EINVAL);
*retval = n;
return (0);
 }
Index: lib/libc/sys/select.2
===
RCS file: /cvs/src/lib/libc/sys/select.2,v
retrieving revision 1.42
diff -u -p -r1.42 select.2
--- lib/libc/sys/select.2   17 Sep 2016 01:01:42 -  1.42
+++ lib/libc/sys/select.2   1 Apr 2020 10:35:29 -
@@ -188,7 +188,7 @@ The specified time limit is invalid.
 One of its components is negative or too large.
 .It Bq Er EINVAL
 .Fa nfds
-was less than 0.
+was smaller than the smallest descriptor specified in the sets.
 .El
 .Sh SEE ALSO
 .Xr accept 2 ,