On 14/06/2019 09:49, Amit Shah wrote: > On Thu, 2019-06-13 at 10:53 +0200, Laurent Vivier wrote: >> On 12/06/2019 09:03, Amit Shah wrote: >>> On Tue, 2019-06-11 at 19:20 +0200, Laurent Vivier wrote: >>>> The virtio-rng linux driver can be stuck in virtio_read() on a >>>> wait_for_completion_killable() call if the virtio-rng device in >>>> QEMU >>>> doesn't provide data. >>>> >>>> It's a problem, because virtio_read() is called from >>>> rng_get_data() >>>> with >>>> reading_mutex() held. The same mutex is taken by >>>> add_early_randomness() >>>> and hwrng_fillfn() and this brings to a hang during the boot >>>> sequence >>>> if >>>> the virtio-rng driver is builtin. >>>> Moreover, another lock is taken (rng_mutex) when the hwrng driver >>>> wants to switch the RNG device or the user tries to unplug the >>>> virtio-rng >>>> PCI card, and this can hang too because the virtio-rng driver is >>>> only >>>> able >>>> to release the card if the virtio-rng device sends back the >>>> virtqueue >>>> element. >>>> >>>> # echo -n virtio_rng.1 > /sys/class/misc/hw_random/rng_current >>>> [ 240.165234] INFO: task kworker/u2:1:34 blocked for more than >>>> 120 >>>> seconds. >>>> [ 240.165961] "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" >>>> disables this message. >>>> [ 240.166708] kworker/u2:1 D >>>> ffffffffb86b85a8 0 34 2 0x00000000 >>>> [ 240.166714] Workqueue: kacpi_hotplug acpi_hotplug_work_fn >>>> [ 240.166716] ffffa0e8f3c0b890 0000000000000046 >>>> ffffa0e8f3c00000 >>>> ffffa0e8f3c0bfd8 >>>> [ 240.166717] ffffa0e8f3c0bfd8 ffffa0e8f3c0bfd8 >>>> ffffa0e8f3c00000 >>>> ffffffffb86b85a0 >>>> [ 240.166719] ffffffffb86b85a4 ffffa0e8f3c00000 >>>> 00000000ffffffff >>>> ffffffffb86b85a8 >>>> [ 240.166720] Call Trace: >>>> [ 240.166725] [<ffffffffb82a61c9>] >>>> schedule_preempt_disabled+0x29/0x70 >>>> [ 240.166727] [<ffffffffb82a40f7>] >>>> __mutex_lock_slowpath+0xc7/0x1d0 >>>> [ 240.166728] [<ffffffffb82a350f>] mutex_lock+0x1f/0x2f >>>> [ 240.166730] [<ffffffffb8022b52>] hwrng_register+0x32/0x1d0 >>>> [ 240.166733] [<ffffffffc07fa149>] virtrng_scan+0x19/0x30 >>>> [virtio_rng] >>>> [ 240.166744] [<ffffffffc03108db>] >>>> virtio_dev_probe+0x1eb/0x290 >>>> [virtio] >>>> [ 240.166746] [<ffffffffb803d6e5>] >>>> driver_probe_device+0x145/0x3c0 >>>> ... >>>> >>>> In some case, the QEMU RNG backend is not able to provide data, >>>> and >>>> the virtio-rng device is not aware of that: >>>> - with rng-random using /dev/random and no entropy is available, >>>> - with rng-egd started with a socket in "server,nowait" mode and >>>> no daemon connected, >>>> - with rng-egd and an egd daemon that is not providing enough >>>> data, >>>> - ... >>>> >>>> To release the locks regularly, this patch adds a watchdog in >>>> QEMU >>>> virtio-rng device that sends back to the guest the virtqueue >>>> buffer >>>> with a 0 byte payload. This case is expected and correctly >>>> managed by >>>> the hwrng core. >>> >>> I'm wondering if it makes more sense to rework the way the kernel >>> driver requests for seeding entropy during probe. >> >> The kernel side was my first angle of attack. >> I tried first to not block in add_early_randomness(): >> >> "hwrng: core - don't block in add_early_randomness()" >> https://patchwork.kernel.org/patch/10877571/ >> >> But I agree with the maintainer, the problem must be fixed at virtio- >> rng >> level. > > Yea. > > How much of this is due to 'rng-egd was not set up properly; the > backend did not appear in time' -- ie a setup problem; vs a problem we > can expect to recur?
I agree there is a configuration issue here, but not only. Imagine the egd daemon on host hangs, is it normal the guest hangs (in the kernel) too? It can also happen with rng-random on a host with not enough entropy. Imagine several guests on the same host that require entropy and there is not enough entropy for all. > The current patch affects *all* requests from the guest. The problem > being seen though is just during the driver probe. Is the current Not only during the device probe. It happens also when we try to unplug the virtio-rng card or when we try to switch rng backend in the guest kernel. > patch doing much more than is required? Say the device runs out of > randomness to provide after setup - the guest's read calls will just > block and remain killable. It's the probe currently that is not > killable. I don't think to be killable is the solution of the problem: who will kill it? How to know we need to kill something? Is this normal to have to explicitly kill a process when we want to unplug a card? > Other options to explore are: > > 1. Ensure setup is ready (ie the device's backend is set up before > plugging in the device), so that the device is ready to serve requests > as soon as it's plugged in A way I have explored is to not set ready the virtio-rng device in QEMU if the backend is not ready (this should prevent the driver to be started in the guest), but this only fixes a problem at startup. It doesn't fix the problem if the EGD daemon on host hangs later or if there is no enough entropy on the host (with EGD or /dev/random). > 2. Make the add_early_randomness optional for virtio-rng. (This is a > big hammer, and working around bad setup problems.) I've been tempted to remove it, but I don't remove code I don't understand: why do we need the add_early_randomness() in first place? >>> The virtio_read call is killable, so it can take signals when >>> initiated >>> by userspace. For the initial probe, specifying a timeout / >>> watchdog >>> in the driver is better. >> ... >> But I have a problem doing the timeout / watchdog at driver level: >> once >> the buffer is submitted to the virtqueue, how to cancel it? Is there >> a >> way to ask the QEMU device to not process the element in the >> virtqueue >> we have stopped to wait for because of the timeout (or for the >> signal: I >> don't understand how it works in this case. How it is canceled?)? > > Could be achieved via a control command? But all that sounds quite> ugly > (and also requires host+guest modification). > Adding a control command changes the specification of the device, can we? We need to add a queue, I guess? So I think there is also a problem with the "killable" approach. Using a watchdog in the QEMU device allows to remove the killable option and to avoid to have a lost request in the queue, without changing the specs, as the kernel rng core is able to manage 0 length data payload. Thanks, Laurent