On Thu, Jan 06, 2022 at 01:35:32PM +0000, Stefan Hajnoczi wrote: > > > >> +static void vfu_object_attach_ctx(void *opaque) > > > >> +{ > > > >> + VfuObject *o = opaque; > > > >> + GPollFD pfds[1]; > > > >> + int ret; > > > >> + > > > >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL); > > > >> + > > > >> + pfds[0].fd = o->vfu_poll_fd; > > > >> + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR; > > > >> + > > > >> +retry_attach: > > > >> + ret = vfu_attach_ctx(o->vfu_ctx); > > > >> + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { > > > >> + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS); > > > >> + goto retry_attach; > > > > > > > > This can block the thread indefinitely. Other events like monitor > > > > commands are not handled in this loop. Please make this asynchronous > > > > (set an fd handler and return from this function so we can try again > > > > later). > > > > > > > > The vfu_attach_ctx() implementation synchronously negotiates the > > > > vfio-user connection :(. That's a shame because even if accept(2) is > > > > handled asynchronously, the negotiation can still block. It would be > > > > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to > > > > avoid blocking. Is that possible? > > > > > > Thanos / John, > > > > > > Any thoughts on this? > > > > I'm discussing this with John and FYI there are other places where > > libvfio-user can block, e.g. sending a response or receiving a command. Is > > it just the negotiation you want it to be asynchronous or _all_ > > libvfio-user operations? Making libvfio-user fully asynchronous might > > require a substantial API rewrite. > > I see at least two reasons for a fully async API: > > 1. The program wants to handle other events (e.g. a management REST API) > from the same event loop thread that invokes libvfio-user. If > libvfio-user blocks then the other events cannot be handled within a > reasonable time frame. > > The workaround for this is to use multi-threading and ignore the > event-driven architecture implied by vfu_get_poll_fd(). > > 2. The program handles multiple clients that do not trust each other. > This could be a software-defined network switch or storage appliance. > A malicious client can cause a denial-of-service by making a > libvfio-user call block. > > Again, the program needs separate threads instead of an event loop to > work around this.
Hi Stefan, we're certainly aware of the rationale. Ultimately we just haven't yet found resources to fix this: in practice, we don't really hit problems from the parts that are still synchronous. Of course, that's not a good argument when it comes to your second issue, and indeed the library is not currently suitable for multiple untrusted clients. For Jag but: patches are welcome. But it's not just negotiate(): all sorts of things are potentially synchronous - for example, it's expected that the region read/write callbacks are done synchronously. Any other client reply, or server->client message, is also synchronous. It's partly why we haven't yet stabilised the API. regards john