On Thu, Jan 06, 2022 at 01:35:32PM +0000, Stefan Hajnoczi wrote:

> > > >> +static void vfu_object_attach_ctx(void *opaque)
> > > >> +{
> > > >> +    VfuObject *o = opaque;
> > > >> +    GPollFD pfds[1];
> > > >> +    int ret;
> > > >> +
> > > >> +    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> > > >> +
> > > >> +    pfds[0].fd = o->vfu_poll_fd;
> > > >> +    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> > > >> +
> > > >> +retry_attach:
> > > >> +    ret = vfu_attach_ctx(o->vfu_ctx);
> > > >> +    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> > > >> +        qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
> > > >> +        goto retry_attach;
> > > >
> > > > This can block the thread indefinitely. Other events like monitor
> > > > commands are not handled in this loop. Please make this asynchronous
> > > > (set an fd handler and return from this function so we can try again
> > > > later).
> > > >
> > > > The vfu_attach_ctx() implementation synchronously negotiates the
> > > > vfio-user connection :(. That's a shame because even if accept(2) is
> > > > handled asynchronously, the negotiation can still block. It would be
> > > > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to
> > > > avoid blocking. Is that possible?
> > > 
> > > Thanos / John,
> > > 
> > >     Any thoughts on this?
> > 
> > I'm discussing this with John and FYI there are other places where 
> > libvfio-user can block, e.g. sending a response or receiving a command. Is 
> > it just the negotiation you want it to be asynchronous or _all_ 
> > libvfio-user operations? Making libvfio-user fully asynchronous might 
> > require a substantial API rewrite.
> 
> I see at least two reasons for a fully async API:
> 
> 1. The program wants to handle other events (e.g. a management REST API)
>    from the same event loop thread that invokes libvfio-user. If
>    libvfio-user blocks then the other events cannot be handled within a
>    reasonable time frame.
> 
>    The workaround for this is to use multi-threading and ignore the
>    event-driven architecture implied by vfu_get_poll_fd().
> 
> 2. The program handles multiple clients that do not trust each other.
>    This could be a software-defined network switch or storage appliance.
>    A malicious client can cause a denial-of-service by making a
>    libvfio-user call block.
> 
>    Again, the program needs separate threads instead of an event loop to
>    work around this.

Hi Stefan, we're certainly aware of the rationale. Ultimately we just haven't
yet found resources to fix this: in practice, we don't really hit problems from
the parts that are still synchronous. Of course, that's not a good argument when
it comes to your second issue, and indeed the library is not currently suitable
for multiple untrusted clients.

For Jag but: patches are welcome. But it's not just negotiate(): all sorts of
things are potentially synchronous - for example, it's expected that the region
read/write callbacks are done synchronously. Any other client reply, or
server->client message, is also synchronous.

It's partly why we haven't yet stabilised the API.

regards
john

Reply via email to