Re: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On 12/28/2018 03:57 PM, Christian Borntraeger wrote: On 28.12.2018 03:26, Wei Wang wrote: Some vqs don't need to be allocated when the related feature bits are disabled. Callers notice the vq allocation layer by setting the related names[i] to be NULL. This patch series fixes the find_vqs implementations to handle this case. So the random crashes during boot are gone. What still does not work is actually using the balloon. So in the qemu monitor using lets say "balloon 1000" will hang the guest. Seems to be a deadlock in the virtio-ccw code. We seem to call the config code in the interrupt handler. Please try with applying both this series and the "virtio-balloon: tweak config_changed" series that I just sent. This series fixes the ccw booting issue and that series tries to fix the ccw deadlock issue. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On Wed, 2 Jan 2019 16:59:19 +0100 Halil Pasic wrote: > On Wed, 2 Jan 2019 14:23:38 +0100 > Cornelia Huck wrote: > > > On Wed, 2 Jan 2019 10:53:14 +0100 > > Cornelia Huck wrote: > > > > > On Tue, 1 Jan 2019 00:40:19 +0100 > > > Halil Pasic wrote: > > > > AFAICT tweaking the balloon code may be simpler than tweaking the > > > > virtio-ccw (transport code). ccw_io_helper() relies on getting > > > > an interrupt when the issued IO is done. If virtio-ccw is buggy, it > > > > needs to be fixed, but I'm not sure it is. > > > > > > I would not call virtio-ccw buggy, but it has some constraints that > > > virtio-pci apparently doesn't have (and which did not show up so far; > > > e.g. virtio-blk schedules a work item on config change, so there's no > > > deadlock there.) > > > > > > One way to get out of that constraint (don't interact with the config > > > space directly in the config changed handler) would be to schedule a > > > work item in virtio-ccw that calls virtio_config_changed() for the > > > device. My understanding is that delaying the notification to a work > > > queue would be fine. > > > > Unfortunately, calling virtio_config_changed() from a work item is not > > enough: That function takes the config_lock, and the virtio-ccw code to > > get the config both needs to allocate some memory and call schedule :/ > > > > The best option really seems to be > > - have virtio-balloon move the handling of the config change onto a > > workqueue or something like that, and > > - document that you cannot read/write the virtio config space from an > > atomic context > > > > Unless someone has a better idea? > > > > I wonder, would making config_lock a mutex suffice? Unless I'm mistaken, you can't take a mutex in an interrupt path. > I've already hinted that a virtio-balloon side fix is probably the > simpler variant. Yes, I think so as well. > I agree, let's fix the regression first, and think about wether to teach > virtio-ccw to allow config manipulation from virtio_config_changed() or > not later. Or whether we can tweak the virtio code instead. But I agree, let's get things working again first. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On Wed, 2 Jan 2019 10:53:14 +0100 Cornelia Huck wrote: > On Tue, 1 Jan 2019 00:40:19 +0100 > Halil Pasic wrote: > > As I said, at the moment I don't have a preference regarding the fix, > > partly because I'm not sure if "reading config inside the handler" is OK > > or not. Maybe Connie or Michael can help us here. I'm however sure that > > commit 86a5597 "virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT" > > breaks virtio-balloon with the ccw transport (i.e. effectively breaks > > virtio-balloon on s390): it used to work before and does not work > > after. > > Yes, that's unfortunate. > > > > > AFAICT tweaking the balloon code may be simpler than tweaking the > > virtio-ccw (transport code). ccw_io_helper() relies on getting > > an interrupt when the issued IO is done. If virtio-ccw is buggy, it > > needs to be fixed, but I'm not sure it is. > > I would not call virtio-ccw buggy, but it has some constraints that > virtio-pci apparently doesn't have (and which did not show up so far; > e.g. virtio-blk schedules a work item on config change, so there's no > deadlock there.) > > One way to get out of that constraint (don't interact with the config > space directly in the config changed handler) would be to schedule a > work item in virtio-ccw that calls virtio_config_changed() for the > device. My understanding is that delaying the notification to a work > queue would be fine. Unfortunately, calling virtio_config_changed() from a work item is not enough: That function takes the config_lock, and the virtio-ccw code to get the config both needs to allocate some memory and call schedule :/ The best option really seems to be - have virtio-balloon move the handling of the config change onto a workqueue or something like that, and - document that you cannot read/write the virtio config space from an atomic context Unless someone has a better idea? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On Tue, 1 Jan 2019 00:40:19 +0100 Halil Pasic wrote: > On Mon, 31 Dec 2018 06:03:51 + > "Wang, Wei W" wrote: > > > On Sunday, December 30, 2018 2:06 PM, Halil Pasic wrote: > > > > > > I guess you are the first one trying to read virtio config from within > > > interrupt > > > context. AFAICT this never worked. > > > > I'm not sure about "never worked". It seems to work well with virtio-pci. > > But looking forward to hearing a solid reason why reading config inside > > the handler is forbidden (if that's true). > > By "never worked" I meant "never worked with virtio-ccw". Sorry > about the misunderstanding. Seems I've also failed to convey that I don't > know if reading config inside the handler is forbidden or not. So please > don't expect me providing the solid reasons you are looking forward to. It won't work with the current code, and this is all a bit ugly :( More verbose explanation below. > > > > > > About what happens. The apidoc of ccw_device_start() says it needs to be > > > called with the ccw device lock held, so ccw_io_helper() tries to take it > > > (since > > > forever I guess). OTOH do_cio_interrupt() takes the subchannel lock and > > > io_subchannel_initialize_dev() makes the ccw device lock be the subchannel > > > lock. That means when one tries to get virtio config form within a cio > > > interrupt context we deadlock, because we try to take a lock we already > > > have. > > > > > > That said, I don't think this limitation is by design (i.e. intended). > > > Maybe Connie can help us with that question. AFAIK we have nothing > > > documented regarding this (neither that can nor can't). The main problem is that channel I/O is a fundamentally asynchronous mechanism. As channel devices don't have the concept of config spaces (or some other things that virtio needs), I decided to map reading/writing the config space to channel commands. Starting I/O on a subchannel always needs the lock (to avoid races on the subchannel), and the asynchronous interrupt for that I/O needs the lock as well (for the same reason; things like the scsw contain state that you want to access without races). A config change also means that the subchannel becomes state pending (and an interrupt is made pending), so the subchannel lock is taken for that path as well. (Virtqueue notifications are handled differently on modern QEMU, but that does not come into play here.) > > > > > > Obviously, there are multiple ways around this problem, and at the moment > > > I can't tell which would be my preferred one. > > > > Yes, it's also not difficult to tweak the virtio-balloon code to avoid that > > issue. > > But if that's just an issue with ccw itself, I think it's better to tweak > > ccw and > > remain virtio-balloon unchanged. > > > > As I said, at the moment I don't have a preference regarding the fix, > partly because I'm not sure if "reading config inside the handler" is OK > or not. Maybe Connie or Michael can help us here. I'm however sure that > commit 86a5597 "virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT" > breaks virtio-balloon with the ccw transport (i.e. effectively breaks > virtio-balloon on s390): it used to work before and does not work > after. Yes, that's unfortunate. > > AFAICT tweaking the balloon code may be simpler than tweaking the > virtio-ccw (transport code). ccw_io_helper() relies on getting > an interrupt when the issued IO is done. If virtio-ccw is buggy, it > needs to be fixed, but I'm not sure it is. I would not call virtio-ccw buggy, but it has some constraints that virtio-pci apparently doesn't have (and which did not show up so far; e.g. virtio-blk schedules a work item on config change, so there's no deadlock there.) One way to get out of that constraint (don't interact with the config space directly in the config changed handler) would be to schedule a work item in virtio-ccw that calls virtio_config_changed() for the device. My understanding is that delaying the notification to a work queue would be fine. >From what I can see, that should make us safe on any modern QEMU (that uses adapter interrupts). There still might be problems if we are using classic subchannel interrupts for virtqueue notifications and the handler interacts with the config space, but I'm not sure whether it is worth spending time on that. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On Sunday, December 30, 2018 2:06 PM, Halil Pasic wrote: > > I guess you are the first one trying to read virtio config from within > interrupt > context. AFAICT this never worked. I'm not sure about "never worked". It seems to work well with virtio-pci. But looking forward to hearing a solid reason why reading config inside the handler is forbidden (if that's true). > About what happens. The apidoc of ccw_device_start() says it needs to be > called with the ccw device lock held, so ccw_io_helper() tries to take it > (since > forever I guess). OTOH do_cio_interrupt() takes the subchannel lock and > io_subchannel_initialize_dev() makes the ccw device lock be the subchannel > lock. That means when one tries to get virtio config form within a cio > interrupt context we deadlock, because we try to take a lock we already have. > > That said, I don't think this limitation is by design (i.e. intended). > Maybe Connie can help us with that question. AFAIK we have nothing > documented regarding this (neither that can nor can't). > > Obviously, there are multiple ways around this problem, and at the moment > I can't tell which would be my preferred one. Yes, it's also not difficult to tweak the virtio-balloon code to avoid that issue. But if that's just an issue with ccw itself, I think it's better to tweak ccw and remain virtio-balloon unchanged. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On Friday, December 28, 2018 3:57 PM, Christian Borntraeger wrote: > On 28.12.2018 03:26, Wei Wang wrote: > > Some vqs don't need to be allocated when the related feature bits are > > disabled. Callers notice the vq allocation layer by setting the > > related names[i] to be NULL. > > > > This patch series fixes the find_vqs implementations to handle this case. > > So the random crashes during boot are gone. > What still does not work is actually using the balloon. > > So in the qemu monitor using lets say "balloon 1000" will hang the guest. > Seems to be a deadlock in the virtio-ccw code. We seem to call the config > code in the interrupt handler. Yes. It reads a config register from the interrupt handler. Do you know why ccw doesn't support it and has some internal lock that caused the deadlock issue? Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/2] Virtio: fix some vq allocation issues
On 28.12.2018 03:26, Wei Wang wrote: > Some vqs don't need to be allocated when the related feature bits are > disabled. Callers notice the vq allocation layer by setting the related > names[i] to be NULL. > > This patch series fixes the find_vqs implementations to handle this case. So the random crashes during boot are gone. What still does not work is actually using the balloon. So in the qemu monitor using lets say "balloon 1000" will hang the guest. Seems to be a deadlock in the virtio-ccw code. We seem to call the config code in the interrupt handler. crash> bt PID: 0 TASK: d9a400CPU: 0 COMMAND: "swapper/0" LOWCORE INFO: -psw : 0x0404c0018000 0x00116472 -function : smp_yield_cpu at 116472 -prefix : 0x7fffc000 -cpu timer: 0x7fcc8c0af5be -clock cmp: 0x720a4e4002831000 -general registers: 00 00 0x009c 0x00fac2b0 0x0015 0xffe2 0x03e00010 0x0001 00 0x0001 0x03e8 0x0f85c020 00 0x0001 0x00116464 0x03e00035fad0 -access registers: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -control registers: 0x008014866a10 0x00fbc007 0x00100140 00 0x 0x00100140 0x3100 0x0f9281c3 00 00 00 00 00 0x00fbc007 0xdb00 0x00100280 -floating point registers: 00 0x02aa374b0298 0x0001 0x0010 0x01ae 0x000f 0x02aa46056010 0x02aa460681c0 0x03ffd867d590 0x03ffdca7c818 0x03ffd867d58f 0x03fff6ffdc60 0x03ffd867dad8 0x03ffdca7c5e8 0x03ffd867dadc 0x03ffdca7c818 #0 [3e00035faf8] arch_spin_lock_wait at a7bd52 #1 [3e00035fb50] ccw_io_helper at 9130ea #2 [3e00035fbd0] virtio_ccw_get_config at 914a28 #3 [3e00035fc30] virtballoon_changed at 76e776 #4 [3e00035fc70] virtio_config_changed at 76aabc #5 [3e00035fca8] virtio_ccw_int_handler at 914ede #6 [3e00035fd18] ccw_device_irq at 8941d4 #7 [3e00035fd48] do_cio_interrupt at 885906 #8 [3e00035fd80] __handle_irq_event_percpu at 1b3c22 #9 [3e00035fdf0] handle_irq_event_percpu at 1b3e1e #10 [3e00035fe28] handle_percpu_irq at 1b87d8 #11 [3e00035fe58] generic_handle_irq at 1b2ce6 #12 [3e00035fe70] do_IRQ at 10c3b2 #13 [3e00035fea8] io_int_handler at a86b3c PSW: 0404c0018000 001034f6 (enabled_wait+70) GPRS: 7ff70200 0706c0018000 000c 01bf6f331c58 0001 7ff70200 00a8b2f0 001034f6 03e000317e00 #0 [3e000317e28] arch_cpu_idle at 103842 #1 [3e000317e48] do_idle at 17ad18 #2 [3e000317e80] cpu_startup_entry at 17af16 #3 [3e000317ea8] arch_call_rest_init at eac934 > > Wei Wang (2): > virtio_pci: use queue idx instead of array idx to set up the vq > virtio: don't allocate vqs when names[i] = NULL > > drivers/misc/mic/vop/vop_main.c| 9 +++-- > drivers/remoteproc/remoteproc_virtio.c | 9 +++-- > drivers/s390/virtio/virtio_ccw.c | 12 +--- > drivers/virtio/virtio_mmio.c | 9 +++-- > drivers/virtio/virtio_pci_common.c | 8 > 5 files changed, 34 insertions(+), 13 deletions(-) > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v1 0/2] Virtio: fix some vq allocation issues
Some vqs don't need to be allocated when the related feature bits are disabled. Callers notice the vq allocation layer by setting the related names[i] to be NULL. This patch series fixes the find_vqs implementations to handle this case. Wei Wang (2): virtio_pci: use queue idx instead of array idx to set up the vq virtio: don't allocate vqs when names[i] = NULL drivers/misc/mic/vop/vop_main.c| 9 +++-- drivers/remoteproc/remoteproc_virtio.c | 9 +++-- drivers/s390/virtio/virtio_ccw.c | 12 +--- drivers/virtio/virtio_mmio.c | 9 +++-- drivers/virtio/virtio_pci_common.c | 8 5 files changed, 34 insertions(+), 13 deletions(-) -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization