Re: pselect(2) bug or feature?
The select(2) family of syscalls was designed to work this way (block when file descriptors are not available, even when they cannot become available). For example: select(0, NULL, NULL, NULL, NULL); blocks forever. In fact in the very old days (before usleep, nanosleep, clock_nanosleep), select(2) was the only way to get sub-second sleeps from the kernel, and the idiom was similar to the above: void usleep(int usec) { struct timeval tv; tv.tv_sec = 0; tv.tv_usec = usec; select(0, NULL, NULL, NULL, ); } Best, christos > On Apr 1, 2020, at 8:19 AM, Theo de Raadt wrote: > > The regress test is wrong. > > The first parameter to pselect has to tell it which bits to look at. > It should be something like getdtablesize(), or fd+1, or FD_SETSIZE. > > And then there's all the handwaving about fd_set overflow, meaning what > if fd >= FD_SETSIZE. I've never heard of an exploitable one though. > > Martin Pieuchot wrote: > >> While working on moving pselect(2) outside of the KERNEL_LOCK() I found >> this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c >> regression test. Should it be considered as a bug or feature? >> >> The above mentioned test does the following (simplified): >> >> fd = open("/dev/null", O_RDONLY); >> FD_ZERO(); >> FD_SET(fd, ); >> pselect(1, , NULL, NULL, NULL, ); >> >> Note that in most environments, open(2) will return a value higher than >> 1 which means the value written by FD_SET() will never be checked by the >> current implementation of {p,}select(2). In this case, even if it isn't >> waiting for any condition, the current implementation will block. >> >> To put it differently, the code above is an obfuscated way to write: >> >> sigsuspend(); >> >> If we agree that this is not what the user wants to do, I'm suggesting >> the diff below which makes {p,}select(2) return EINVAL if no bit has >> been found in the given set up to `nfds'. >> >> Independently from our decision, I'd suggest to rename the variable >> name `nfds' in the syscall documentation because it can lead to this >> kind of confusion. >> >> Finally if we accept this change the regression test will have to be >> adapted because polling "/dev/null" for reading always return true so >> pselect(2) wont block waiting for a signal. Any suggestion? >> >> Index: sys/kern/sys_generic.c >> === >> RCS file: /cvs/src/sys/kern/sys_generic.c,v >> retrieving revision 1.131 >> diff -u -p -r1.131 sys_generic.c >> --- sys/kern/sys_generic.c 20 Mar 2020 04:11:05 - 1.131 >> +++ sys/kern/sys_generic.c 1 Apr 2020 10:27:51 - >> @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f >> { >> caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits; >> struct filedesc *fdp = p->p_fd; >> -int msk, i, j, fd; >> +int msk, i, j, fd, foundfds = 0; >> fd_mask bits; >> struct file *fp; >> int n = 0; >> @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f >> bits &= ~(1 << j); >> if ((fp = fd_getfile(fdp, fd)) == NULL) >> return (EBADF); >> +foundfds++; >> if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) { >> FD_SET(fd, pobits); >> n++; >> @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f >> } >> } >> } >> +if (foundfds == 0) >> +return (EINVAL); >> *retval = n; >> return (0); >> } >> Index: lib/libc/sys/select.2 >> === >> RCS file: /cvs/src/lib/libc/sys/select.2,v >> retrieving revision 1.42 >> diff -u -p -r1.42 select.2 >> --- lib/libc/sys/select.217 Sep 2016 01:01:42 - 1.42 >> +++ lib/libc/sys/select.21 Apr 2020 10:35:29 - >> @@ -188,7 +188,7 @@ The specified time limit is invalid. >> One of its components is negative or too large. >> .It Bq Er EINVAL >> .Fa nfds >> -was less than 0. >> +was smaller than the smallest descriptor specified in the sets. >> .El >> .Sh SEE ALSO >> .Xr accept 2 , >> signature.asc Description: Message signed with OpenPGP
Re: pselect(2) bug or feature?
On Wed, 01 Apr 2020 13:53:53 +0200, Claudio Jeker wrote: > I think this is actually a feature. The standard does not mention that you > have to listen on something and it would allow to use select() as a > sleep() replacement. I would expect that pselect() should behave the same > way. I agree that this is a feature. POSIX also specifies EINVAL if nfds is larger than FD_SETSIZE but we support arrays of fd_set larger than FD_SETSIZE so that is not an error in our implementation. - todd
Re: pselect(2) bug or feature?
The regress test is wrong. The first parameter to pselect has to tell it which bits to look at. It should be something like getdtablesize(), or fd+1, or FD_SETSIZE. And then there's all the handwaving about fd_set overflow, meaning what if fd >= FD_SETSIZE. I've never heard of an exploitable one though. Martin Pieuchot wrote: > While working on moving pselect(2) outside of the KERNEL_LOCK() I found > this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c > regression test. Should it be considered as a bug or feature? > > The above mentioned test does the following (simplified): > > fd = open("/dev/null", O_RDONLY); > FD_ZERO(); > FD_SET(fd, ); > pselect(1, , NULL, NULL, NULL, ); > > Note that in most environments, open(2) will return a value higher than > 1 which means the value written by FD_SET() will never be checked by the > current implementation of {p,}select(2). In this case, even if it isn't > waiting for any condition, the current implementation will block. > > To put it differently, the code above is an obfuscated way to write: > > sigsuspend(); > > If we agree that this is not what the user wants to do, I'm suggesting > the diff below which makes {p,}select(2) return EINVAL if no bit has > been found in the given set up to `nfds'. > > Independently from our decision, I'd suggest to rename the variable > name `nfds' in the syscall documentation because it can lead to this > kind of confusion. > > Finally if we accept this change the regression test will have to be > adapted because polling "/dev/null" for reading always return true so > pselect(2) wont block waiting for a signal. Any suggestion? > > Index: sys/kern/sys_generic.c > === > RCS file: /cvs/src/sys/kern/sys_generic.c,v > retrieving revision 1.131 > diff -u -p -r1.131 sys_generic.c > --- sys/kern/sys_generic.c20 Mar 2020 04:11:05 - 1.131 > +++ sys/kern/sys_generic.c1 Apr 2020 10:27:51 - > @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f > { > caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits; > struct filedesc *fdp = p->p_fd; > - int msk, i, j, fd; > + int msk, i, j, fd, foundfds = 0; > fd_mask bits; > struct file *fp; > int n = 0; > @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f > bits &= ~(1 << j); > if ((fp = fd_getfile(fdp, fd)) == NULL) > return (EBADF); > + foundfds++; > if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) { > FD_SET(fd, pobits); > n++; > @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f > } > } > } > + if (foundfds == 0) > + return (EINVAL); > *retval = n; > return (0); > } > Index: lib/libc/sys/select.2 > === > RCS file: /cvs/src/lib/libc/sys/select.2,v > retrieving revision 1.42 > diff -u -p -r1.42 select.2 > --- lib/libc/sys/select.2 17 Sep 2016 01:01:42 - 1.42 > +++ lib/libc/sys/select.2 1 Apr 2020 10:35:29 - > @@ -188,7 +188,7 @@ The specified time limit is invalid. > One of its components is negative or too large. > .It Bq Er EINVAL > .Fa nfds > -was less than 0. > +was smaller than the smallest descriptor specified in the sets. > .El > .Sh SEE ALSO > .Xr accept 2 , >
Re: pselect(2) bug or feature?
On Wed, Apr 01, 2020 at 01:08:10PM +0200, Martin Pieuchot wrote: > While working on moving pselect(2) outside of the KERNEL_LOCK() I found > this surprising {p,}select(2) behavior exposed by the libc/sys/t_select.c > regression test. Should it be considered as a bug or feature? I think this is actually a feature. The standard does not mention that you have to listen on something and it would allow to use select() as a sleep() replacement. I would expect that pselect() should behave the same way. poll() does the same thing (you are allowed to pass a 0 poll fds). > The above mentioned test does the following (simplified): > > fd = open("/dev/null", O_RDONLY); > FD_ZERO(); > FD_SET(fd, ); > pselect(1, , NULL, NULL, NULL, ); > > Note that in most environments, open(2) will return a value higher than > 1 which means the value written by FD_SET() will never be checked by the > current implementation of {p,}select(2). In this case, even if it isn't > waiting for any condition, the current implementation will block. > > To put it differently, the code above is an obfuscated way to write: > > sigsuspend(); > > If we agree that this is not what the user wants to do, I'm suggesting > the diff below which makes {p,}select(2) return EINVAL if no bit has > been found in the given set up to `nfds'. > > Independently from our decision, I'd suggest to rename the variable > name `nfds' in the syscall documentation because it can lead to this > kind of confusion. > > Finally if we accept this change the regression test will have to be > adapted because polling "/dev/null" for reading always return true so > pselect(2) wont block waiting for a signal. Any suggestion? > > Index: sys/kern/sys_generic.c > === > RCS file: /cvs/src/sys/kern/sys_generic.c,v > retrieving revision 1.131 > diff -u -p -r1.131 sys_generic.c > --- sys/kern/sys_generic.c20 Mar 2020 04:11:05 - 1.131 > +++ sys/kern/sys_generic.c1 Apr 2020 10:27:51 - > @@ -703,7 +703,7 @@ selscan(struct proc *p, fd_set *ibits, f > { > caddr_t cibits = (caddr_t)ibits, cobits = (caddr_t)obits; > struct filedesc *fdp = p->p_fd; > - int msk, i, j, fd; > + int msk, i, j, fd, foundfds = 0; > fd_mask bits; > struct file *fp; > int n = 0; > @@ -719,6 +719,7 @@ selscan(struct proc *p, fd_set *ibits, f > bits &= ~(1 << j); > if ((fp = fd_getfile(fdp, fd)) == NULL) > return (EBADF); > + foundfds++; > if ((*fp->f_ops->fo_poll)(fp, flag[msk], p)) { > FD_SET(fd, pobits); > n++; > @@ -727,6 +728,8 @@ selscan(struct proc *p, fd_set *ibits, f > } > } > } > + if (foundfds == 0) > + return (EINVAL); > *retval = n; > return (0); > } > Index: lib/libc/sys/select.2 > === > RCS file: /cvs/src/lib/libc/sys/select.2,v > retrieving revision 1.42 > diff -u -p -r1.42 select.2 > --- lib/libc/sys/select.2 17 Sep 2016 01:01:42 - 1.42 > +++ lib/libc/sys/select.2 1 Apr 2020 10:35:29 - > @@ -188,7 +188,7 @@ The specified time limit is invalid. > One of its components is negative or too large. > .It Bq Er EINVAL > .Fa nfds > -was less than 0. > +was smaller than the smallest descriptor specified in the sets. > .El > .Sh SEE ALSO > .Xr accept 2 , > -- :wq Claudio