Am 15.06.2018 um 01:21 hat Nishanth Aravamudan geschrieben: > laio_init() can fail for a couple of reasons, which will lead to a NULL > pointer dereference in laio_attach_aio_context(). > > To solve this, add a aio_linux_aio_setup() path which is called where > aio_get_linux_aio() is called currently, but can propogate errors up. > > virtio-block and virtio-scsi call this new function before calling > blk_io_plug() (which eventually calls aio_get_linux_aio). This is > necessary because plug/unplug currently assume they do not fail. > > It is trivial to make qemu segfault in my testing. Set > /proc/sys/fs/aio-max-nr to 0 and start a guest with > aio=native,cache=directsync. With this patch, the guest successfully > starts (but obviously isn't using native AIO). Setting aio-max-nr back > up to a reasonable value, AIO contexts are consumed normally. > > Signed-off-by: Nishanth Aravamudan <naravamu...@digitalocean.com>
This is not a reasonable fix for several reasons: * You frame this as a problem of blk_io_plug(), but that's not what it is. It is a problem of delayed initialisation of Linux AIO that may in the future affect other operations as well. * This approch would need a fix in every device that uses a problematic operation. You came across virtio + blk_io_plug(), but that are probably not the only cases in the long run, which would make the code spread much wider than it should. * There is only a single block driver that actually implements the new callback. This is a sign that this is not a generally useful callback. Instead, the fix should be done locally in the file-posix driver, and the virtio devices shouldn't be touched at all. I think it would be good enough to call laio_init() when attaching to a new AioContext and to switch to the thread pool if it fails, like you already do. Maybe an error_report() would be appropriate to log the fact that we're not using the requested AIO mode. Kevin