Re: Please test: switch select(2) to kqfilters

Scott Cheloha Sun, 04 Oct 2020 21:39:17 -0700

On Sat, Oct 03, 2020 at 09:09:00AM +0200, Martin Pieuchot wrote:
> On 02/10/20(Fri) 19:09, Scott Cheloha wrote:
> > On Fri, Oct 02, 2020 at 12:19:35PM +0200, Martin Pieuchot wrote:
> > > @@ -635,12 +642,39 @@ sys_kevent(struct proc *p, void *v, regi
> > >           goto done;
> > >   }
> > >  
> > > +
> > >   KQREF(kq);
> > >   FRELE(fp, p);
> > > - error = kqueue_scan(kq, SCARG(uap, nevents), SCARG(uap, eventlist),
> > > -     tsp, kev, p, &n);
> > > + /*
> > > +  * Collect as many events as we can.  The timeout on successive
> > > +  * loops is disabled (kqueue_scan() becomes non-blocking).
> > > +  */
> > > + total = 0;
> > > + error = 0;
> > > + kqueue_scan_setup(&scan, kq);
> > > + while ((n = SCARG(uap, nevents) - total) > 0) {
> > > +         if (n > nitems(kev))
> > > +                 n = nitems(kev);
> > > +         ready = kqueue_scan(&scan, n, kev, tsp, p, &error);
> > > +         if (ready == 0)
> > > +                 break;
> > > +         error = copyout(kev, SCARG(uap, eventlist) + total,
> > > +             sizeof(struct kevent) * ready);
> > > +#ifdef KTRACE
> > > +         if (KTRPOINT(p, KTR_STRUCT))
> > > +                 ktrevent(p, kev, ready);
> > > +#endif
> > > +         total += ready;
> > > +         if (error || ready < n)
> > > +                 break;
> > > +         tsp = &ts;              /* successive loops non-blocking */
> > > +         timespecclear(tsp);
> > 
> > Here, this.  Why do we force a non-blocking loop the second time?
> 
> If there's a second time that implies the first time already reported
> some events so there's already something to return to userland.  In that
> case we just want to gather the events that were not collected the first
> time and not sleep again.


Okay, now I see it, thank you for the explanation.

> > > + }
> > > + kqueue_scan_finish(&scan);
> > >   KQRELE(kq);
> > > - *retval = n;
> > > + if (error == EWOULDBLOCK)
> > > +         error = 0;
> > > + *retval = total;
> > >   return (error);
> > >  
> > >   done:
> > > @@ -894,24 +928,22 @@ kqueue_sleep(struct kqueue *kq, struct t
> > >   return (error);
> > >  }
> > >  
> > > +/*
> > > + * Scan the kqueue, blocking if necessary until the target time is 
> > > reached.
> > > + * If tsp is NULL we block indefinitely.  If tsp->ts_secs/nsecs are both
> > > + * 0 we do not block at all.
> > > + */
> > >  int
> > > -kqueue_scan(struct kqueue *kq, int maxevents, struct kevent *ulistp,
> > > -    struct timespec *tsp, struct kevent *kev, struct proc *p, int 
> > > *retval)
> > > +kqueue_scan(struct kqueue_scan_state *scan, int maxevents,
> > > +    struct kevent *kevp, struct timespec *tsp, struct proc *p, int 
> > > *errorp)
> > >  {
> > > - struct kevent *kevp;
> > > - struct knote mend, mstart, *kn;
> > > - int s, count, nkev, error = 0;
> > > -
> > > - nkev = 0;
> > > - kevp = kev;
> > > + struct knote *kn;
> > > + struct kqueue *kq = scan->kqs_kq;
> > > + int s, count, nkev = 0, error = 0;
> > >  
> > >   count = maxevents;
> > >   if (count == 0)
> > >           goto done;
> > > -
> > > - memset(&mstart, 0, sizeof(mstart));
> > > - memset(&mend, 0, sizeof(mend));
> > > -
> > >  retry:
> > >   KASSERT(count == maxevents);
> > >   KASSERT(nkev == 0);
> > > @@ -923,7 +955,8 @@ retry:
> > >  
> > >   s = splhigh();
> > >   if (kq->kq_count == 0) {
> > > -         if (tsp != NULL && !timespecisset(tsp)) {
> > > +         if ((tsp != NULL && !timespecisset(tsp)) ||
> > > +             scan->kqs_nevent != 0) {
> > >                   splx(s);
> > >                   error = 0;
> > >                   goto done;
> > > @@ -931,7 +964,7 @@ retry:
> > >           kq->kq_state |= KQ_SLEEP;
> > >           error = kqueue_sleep(kq, tsp);
> > >           splx(s);
> > > -         if (error == 0 || error == EWOULDBLOCK)
> > > +         if (error == 0)
> > >                   goto retry;
> > 
> > Why wouldn't we want to retry in the EWOULDBLOCK case?
> > You have a check for
> > 
> >     tsp != NULL && !timespecisset(tsp)
> > 
> > e.g., when you time out.
> 
> I don't recall why or even if there was a reason.  I'll change it back,
> thanks.

Cool.

> > > +
> > > + /*
> > > +  * The poll/select family of syscalls has been designed to
> > > +  * block when file descriptors are not available, even if
> > > +  * there's nothing to wait for.
> > > +  */
> > > + if (nevents == 0) {
> > > +         uint64_t nsecs = INFSLP;
> > > +
> > > +         if (timeout != NULL)
> > > +                 nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP));
> > > +
> > > +         error = tsleep_nsec(&p->p_kq, PSOCK | PCATCH, "kqsel", nsecs);
> > > +         /* select is not restarted after signals... */
> > > +         if (error == ERESTART)
> > > +                 error = EINTR;
> > > + }
> > 
> > Aside: can the new logic (below) not handle the case where
> > nevents == 0?  Like, what happens if we go into kqueue_scan()
> > with count == 0?
> 
> First like of kqueue_scan() does:
> 
>              if (count == 0) 
>                      goto done
> 
> So there's no sleep.  I opted for the explicit sleep rather than the
> obfuscated one ;)

I'm a bit torn on this.  On the one hand, I like being overt about
special cases.  On the other hand, I worry that this will introduce
bugs.

For instance, I just noticed that you have changed the behavior
in the zero timeout case.  select(2) and pselect(2) are not supposed
to block if the timeout is zero, e.g. code like:

        struct timeval tv;

        timerclear(&tv);
        select(0, NULL, NULL, NULL, &tv);

should not block.  But your new code blocks.

In theory there are no other bugs...

> > > + /* Collect at most `nevents' possibly waiting in kqueue_scan() */
> > > + kqueue_scan_setup(&scan, p->p_kq);
> > > + while (nevents > 0) {
> > > +         struct kevent kev[KQ_NEVENTS];
> > > +         int i, ready, count;
> > > +
> > > +         /* Maxium number of events per iteration */
> > > +         count = MIN(nitems(kev), nevents);
> > > +         ready = kqueue_scan(&scan, count, kev, timeout, p, &error);
> > > +#ifdef KTRACE
> > > +         if (KTRPOINT(p, KTR_STRUCT))
> > > +                 ktrevent(p, kev, ready);
> > > +#endif
> > > +         /* Convert back events that are ready. */
> > > +         for (i = 0; i < ready; i++)
> > > +                 *retval += pselcollect(p, &kev[i], pobits);
> > > +
> > > +         /*
> > > +          * Stop if there was an error or if we had enough
> > > +          * place to collect all events that were ready.
> > > +          */
> > > +         if (error || ready < count)
> > > +                 break;
> > > +
> > > +         timeout = &ts;          /* successive loops non-blocking */
> > > +         timespecclear(timeout);
> > 
> > ... again, it looks like we now truncate instead of resleeping.
> 
> Yes, that's on purpose.  Maybe you can suggest a comment to explain this
> behavior that seems surprising?

After thinking about it a bit I think it is less surprising than I
first thought.

What a surprise :)

I'm trying to communicate:

        kev is a fixed-length array.  The caller's event array might
        be longer than kev.  In that case, we may need to call kqueue_scan()
        multiple times to capture all events.  But we only want to
        block in kqueue_scan() on the 1st iteration because we necessarily
        captured events during the first call.

Here's my best attempt:

/*
 * There may be more ready events.  We will try to capture them during
 * the next iteration.  However, we have already captured other events,
 * so we do not want to block in kqueue_scan() if it turns out that there
 * aren't any.
 */

... meh.

Re: Please test: switch select(2) to kqfilters

Reply via email to