Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
> -Original Message- > From: Michael S. Tsirkin [mailto:m...@redhat.com] > Sent: Friday, April 20, 2018 12:56 AM > To: Liang, Cunming> Cc: Paolo Bonzini ; Bie, Tiwei ; > jasow...@redhat.com; alex.william...@redhat.com; stefa...@redhat.com; > qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org; Daly, Dan > ; Tan, Jianfeng ; Wang, Zhihong > ; Wang, Xiao W > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering > external host notifiers > > On Thu, Apr 19, 2018 at 04:24:29PM +, Liang, Cunming wrote: > > > > > > > -Original Message- > > > From: Michael S. Tsirkin [mailto:m...@redhat.com] > > > Sent: Thursday, April 19, 2018 11:19 PM > > > To: Paolo Bonzini > > > Cc: Liang, Cunming ; Bie, Tiwei > > > ; jasow...@redhat.com; > > > alex.william...@redhat.com; stefa...@redhat.com; > > > qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org; Daly, Dan > > > ; Tan, Jianfeng ; Wang, > > > Zhihong ; Wang, Xiao W > > > > > > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support > > > registering external host notifiers > > > > > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote: > > > > On 19/04/2018 14:43, Liang, Cunming wrote: > > > > >> 2. Memory barriers. Right now after updating the avail idx, > > > > >> virtio does smp_wmb() and then the MMIO write. Normal hardware > > > > >> drivers do > > > > >> wmb() which is an sfence. Can a PCI device read bypass index > > > > >> write and see a stale index value? > > > > > > > > > > A compiler barrier is enough on strongly-ordered memory > > > > > platform. As it doesn't re-order store, PCI device won't see a stale > > > > > index > value. > > > > > But a weakly-ordered memory needs sfence. > > > > > > > > That is complicated then. We need to define a feature bit and (in > > > > the Linux driver) propagate it to vring_create_virtqueue's > > > > weak_barrier argument. However: > > > > > > > > - if we make it 1 when weak barriers are needed, the device also > > > > needs to nack feature negotiation (not allow setting the > > > > FEATURES_OK) if the bit is not set by the driver. > > > > However, that is not enough. Live migration assumes that it is > > > > okay to migrate a virtual machine from a source that doesn't > > > > support a feature to a destination that supports it. > > > > In this case, it would assume that it is okay to migrate from > > > > software virtio to hardware virtio. This is wrong because the > > > > destination would use weak barriers > > > > > > You can't migrate between systems with different sets of device > > > features right now. > > > > > > > - if we make it 1 when strong barriers are enough, software virtio > > > > devices needs to be updated to expose the bit. This works, > > > > including live migration, but updated drivers will now go slower > > > > when run against an old device that doesn't know the feature bit. > > > > > > > > Maybe bump the PCI revision, so that only the new revision has the bit? > > > > > > > > Thanks, > > > > > > > > Paolo > > > > > > As a first step, if you want to migrate to a HW offloaded solution > > > then you need to enable the feature. > > > > > It does mean it will go a bit slower when run with software, so it's > > > only good if most systems in your cluster do have the HW offload. > > To clarify a bit more, it's suboptimal to always use mandatory barriers for > > MMIO. > Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. > The tradeoff doesn't always happen, software and HW offload can align on the > same page. > > I agree to all of the above except where you say smp_wmb. > > smp_wmb is for controlling SMP effects on Linux, and I suspect it will not do > the > right thing on some non-Intel architectures. > > The claim is I think correct for Intel/AMD platforms, and probably other > strongly > ordered ones. I suspect it's incorrect for ARM and power. > > Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree. Yeah, that's more accurate. > > > > > > I think we can start by getting that working and think about ways to > > > improve down the road. > > > > > > > > > That's the usecase we designed FEATURES_OK for though, so I do > > > think/hope it's enough and we don't need to play with revisions. > > > > > > > > > -- > > > MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 07:35:57PM +0200, Paolo Bonzini wrote: > On 19/04/2018 19:27, Michael S. Tsirkin wrote: > > > > That CONFIG_SMP here is clearly wrong but I don't really know what > > to set it to. Also, we probably should switch virtio_wmb to dma_XX > > barriers. > > > > That's actually easy. Will try to do. > > Should it be dma_wmb() before updating the indices, and wmb() before > writing the "kick virtqueue" register? > > Paolo if anything kick virtqueue should do wmb internally within virtio pci. No one uses strong barriers with virtio pci right now so we don't do it. -- MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On 19/04/2018 19:27, Michael S. Tsirkin wrote: > > That CONFIG_SMP here is clearly wrong but I don't really know what > to set it to. Also, we probably should switch virtio_wmb to dma_XX > barriers. > > That's actually easy. Will try to do. Should it be dma_wmb() before updating the indices, and wmb() before writing the "kick virtqueue" register? Paolo
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 06:59:39PM +0200, Paolo Bonzini wrote: > On 19/04/2018 18:52, Liang, Cunming wrote: > >>> Oh you are right. > >>> > >>> So it's only needed for non-intel platforms or when packets are > >>> in WC memory then. And I don't know whether dpdk ever puts > >>> packets in WC memory. > >>> > >>> I guess we'll cross this bridge when we get to it. > >> Non-TSO architectures seem important... > > > > I'm not familiar with Non-TSO, trying to understand the difference > > according to the feature set. Let's say non-TSO architectures do not > > set 'weak_barriers'. Then mandatory barrier is used for software. HW > > offload on that platform would choose different feature set against > > software? If it's not, essentially we're worried about live migration > > from a TSO to a non-TSO architectures platform? > > I'm worried about live migration from software virtio to hardware virtio > on non-TSO architectures. For example, on ARM you would have a "dmb > ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst" > (dma_wmb) for hardware virtio. > > For this to work, you would have to set up the VM so that it uses the > heavier barriers from the beginning, even when backed by software virtio. > > Paolo Right. Or disallow hardware to software migrations. But generally the mandatory and even dma barriers in Linux are often an overkill. See ARM for example: #if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) #define mb()__arm_heavy_mb() #define rmb() dsb() #define wmb() __arm_heavy_mb(st) #define dma_rmb() dmb(osh) #define dma_wmb() dmb(oshst) #else #define mb()barrier() #define rmb() barrier() #define wmb() barrier() #define dma_rmb() barrier() #define dma_wmb() barrier() #endif That CONFIG_SMP here is clearly wrong but I don't really know what to set it to. Also, we probably should switch virtio_wmb to dma_XX barriers. That's actually easy. Will try to do. -- MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On 19/04/2018 18:52, Liang, Cunming wrote: >>> Oh you are right. >>> >>> So it's only needed for non-intel platforms or when packets are >>> in WC memory then. And I don't know whether dpdk ever puts >>> packets in WC memory. >>> >>> I guess we'll cross this bridge when we get to it. >> Non-TSO architectures seem important... > > I'm not familiar with Non-TSO, trying to understand the difference > according to the feature set. Let's say non-TSO architectures do not > set 'weak_barriers'. Then mandatory barrier is used for software. HW > offload on that platform would choose different feature set against > software? If it's not, essentially we're worried about live migration > from a TSO to a non-TSO architectures platform? I'm worried about live migration from software virtio to hardware virtio on non-TSO architectures. For example, on ARM you would have a "dmb ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst" (dma_wmb) for hardware virtio. For this to work, you would have to set up the VM so that it uses the heavier barriers from the beginning, even when backed by software virtio. Paolo
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 04:24:29PM +, Liang, Cunming wrote: > > > > -Original Message- > > From: Michael S. Tsirkin [mailto:m...@redhat.com] > > Sent: Thursday, April 19, 2018 11:19 PM > > To: Paolo Bonzini> > Cc: Liang, Cunming ; Bie, Tiwei > > ; > > jasow...@redhat.com; alex.william...@redhat.com; stefa...@redhat.com; > > qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org; Daly, Dan > > ; Tan, Jianfeng ; Wang, Zhihong > > ; Wang, Xiao W > > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering > > external host notifiers > > > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote: > > > On 19/04/2018 14:43, Liang, Cunming wrote: > > > >> 2. Memory barriers. Right now after updating the avail idx, virtio > > > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do > > > >> wmb() which is an sfence. Can a PCI device read bypass index write > > > >> and see a stale index value? > > > > > > > > A compiler barrier is enough on strongly-ordered memory platform. As > > > > it doesn't re-order store, PCI device won't see a stale index value. > > > > But a weakly-ordered memory needs sfence. > > > > > > That is complicated then. We need to define a feature bit and (in the > > > Linux driver) propagate it to vring_create_virtqueue's weak_barrier > > > argument. However: > > > > > > - if we make it 1 when weak barriers are needed, the device also needs > > > to nack feature negotiation (not allow setting the FEATURES_OK) if the > > > bit is not set by the driver. > > > However, that is not enough. Live > > > migration assumes that it is okay to migrate a virtual machine from a > > > source that doesn't support a feature to a destination that supports it. > > > In this case, it would assume that it is okay to migrate from > > > software virtio to hardware virtio. This is wrong because the > > > destination would use weak barriers > > > > You can't migrate between systems with different sets of device features > > right > > now. > > > > > - if we make it 1 when strong barriers are enough, software virtio > > > devices needs to be updated to expose the bit. This works, including > > > live migration, but updated drivers will now go slower when run > > > against an old device that doesn't know the feature bit. > > > > > > Maybe bump the PCI revision, so that only the new revision has the bit? > > > > > > Thanks, > > > > > > Paolo > > > > As a first step, if you want to migrate to a HW offloaded solution then you > > need > > to enable the feature. > > > It does mean it will go a bit slower when run with software, > > so it's only good if most systems in your cluster do have the HW offload. > To clarify a bit more, it's suboptimal to always use mandatory barriers for > MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for > MMIO. The tradeoff doesn't always happen, software and HW offload can align > on the same page. I agree to all of the above except where you say smp_wmb. smp_wmb is for controlling SMP effects on Linux, and I suspect it will not do the right thing on some non-Intel architectures. The claim is I think correct for Intel/AMD platforms, and probably other strongly ordered ones. I suspect it's incorrect for ARM and power. Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree. > > I think we can start by getting that working and think about ways to improve > > down the road. > > > > > > That's the usecase we designed FEATURES_OK for though, so I do think/hope > > it's > > enough and we don't need to play with revisions. > > > > > > -- > > MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 06:07:07PM +0200, Paolo Bonzini wrote: > On 19/04/2018 17:59, Michael S. Tsirkin wrote: > > On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote: > >> On 19/04/2018 17:19, Michael S. Tsirkin wrote: > - if we make it 1 when weak barriers are needed, the device also needs > to nack feature negotiation (not allow setting the FEATURES_OK) if the > bit is not set by the driver. > However, that is not enough. Live > migration assumes that it is okay to migrate a virtual machine from a > source that doesn't support a feature to a destination that supports it. > In this case, it would assume that it is okay to migrate from software > virtio to hardware virtio. This is wrong because the destination would > use weak barriers > >>> > >>> You can't migrate between systems with different sets of device features > >>> right now. > >> > >> Yes, you can, exactly because some features are defined not by the > >> machine type but rather by the host kernel. See virtio_net_get_features > >> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in > >> QEMU's hw/virtio/virtio.c. > > > > Oh you are right. Well we can just special-case that one :) > > Anything wrong with bumping the PCI revision? > > Paolo Nothing except in virtio 1 bumping the revision does not prevent loading the old driver. -- MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
> -Original Message- > From: Michael S. Tsirkin [mailto:m...@redhat.com] > Sent: Thursday, April 19, 2018 11:19 PM > To: Paolo Bonzini> Cc: Liang, Cunming ; Bie, Tiwei > ; > jasow...@redhat.com; alex.william...@redhat.com; stefa...@redhat.com; > qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org; Daly, Dan > ; Tan, Jianfeng ; Wang, Zhihong > ; Wang, Xiao W > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering > external host notifiers > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote: > > On 19/04/2018 14:43, Liang, Cunming wrote: > > >> 2. Memory barriers. Right now after updating the avail idx, virtio > > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do > > >> wmb() which is an sfence. Can a PCI device read bypass index write > > >> and see a stale index value? > > > > > > A compiler barrier is enough on strongly-ordered memory platform. As > > > it doesn't re-order store, PCI device won't see a stale index value. > > > But a weakly-ordered memory needs sfence. > > > > That is complicated then. We need to define a feature bit and (in the > > Linux driver) propagate it to vring_create_virtqueue's weak_barrier > > argument. However: > > > > - if we make it 1 when weak barriers are needed, the device also needs > > to nack feature negotiation (not allow setting the FEATURES_OK) if the > > bit is not set by the driver. > > However, that is not enough. Live > > migration assumes that it is okay to migrate a virtual machine from a > > source that doesn't support a feature to a destination that supports it. > > In this case, it would assume that it is okay to migrate from > > software virtio to hardware virtio. This is wrong because the > > destination would use weak barriers > > You can't migrate between systems with different sets of device features right > now. > > > - if we make it 1 when strong barriers are enough, software virtio > > devices needs to be updated to expose the bit. This works, including > > live migration, but updated drivers will now go slower when run > > against an old device that doesn't know the feature bit. > > > > Maybe bump the PCI revision, so that only the new revision has the bit? > > > > Thanks, > > > > Paolo > > As a first step, if you want to migrate to a HW offloaded solution then you > need > to enable the feature. > It does mean it will go a bit slower when run with software, > so it's only good if most systems in your cluster do have the HW offload. To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page. > I think we can start by getting that working and think about ways to improve > down the road. > > > That's the usecase we designed FEATURES_OK for though, so I do think/hope it's > enough and we don't need to play with revisions. > > > -- > MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On 19/04/2018 17:59, Michael S. Tsirkin wrote: > On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote: >> On 19/04/2018 17:19, Michael S. Tsirkin wrote: - if we make it 1 when weak barriers are needed, the device also needs to nack feature negotiation (not allow setting the FEATURES_OK) if the bit is not set by the driver. However, that is not enough. Live migration assumes that it is okay to migrate a virtual machine from a source that doesn't support a feature to a destination that supports it. In this case, it would assume that it is okay to migrate from software virtio to hardware virtio. This is wrong because the destination would use weak barriers >>> >>> You can't migrate between systems with different sets of device features >>> right now. >> >> Yes, you can, exactly because some features are defined not by the >> machine type but rather by the host kernel. See virtio_net_get_features >> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in >> QEMU's hw/virtio/virtio.c. > > Oh you are right. Well we can just special-case that one :) Anything wrong with bumping the PCI revision? Paolo
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote: > On 19/04/2018 17:19, Michael S. Tsirkin wrote: > >> - if we make it 1 when weak barriers are needed, the device also needs > >> to nack feature negotiation (not allow setting the FEATURES_OK) if the > >> bit is not set by the driver. > >> However, that is not enough. Live > >> migration assumes that it is okay to migrate a virtual machine from a > >> source that doesn't support a feature to a destination that supports it. > >> In this case, it would assume that it is okay to migrate from software > >> virtio to hardware virtio. This is wrong because the destination would > >> use weak barriers > > > > You can't migrate between systems with different sets of device features > > right now. > > Yes, you can, exactly because some features are defined not by the > machine type but rather by the host kernel. See virtio_net_get_features > in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in > QEMU's hw/virtio/virtio.c. > > Thanks, > > Paolo Oh you are right. Well we can just special-case that one :) -- MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On 19/04/2018 17:19, Michael S. Tsirkin wrote: >> - if we make it 1 when weak barriers are needed, the device also needs >> to nack feature negotiation (not allow setting the FEATURES_OK) if the >> bit is not set by the driver. >> However, that is not enough. Live >> migration assumes that it is okay to migrate a virtual machine from a >> source that doesn't support a feature to a destination that supports it. >> In this case, it would assume that it is okay to migrate from software >> virtio to hardware virtio. This is wrong because the destination would >> use weak barriers > > You can't migrate between systems with different sets of device features > right now. Yes, you can, exactly because some features are defined not by the machine type but rather by the host kernel. See virtio_net_get_features in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in QEMU's hw/virtio/virtio.c. Thanks, Paolo
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote: > On 19/04/2018 14:43, Liang, Cunming wrote: > >> 2. Memory barriers. Right now after updating the avail idx, > >> virtio does smp_wmb() and then the MMIO write. Normal hardware > >> drivers do wmb() which is an sfence. Can a PCI device read bypass > >> index write and see a stale index value? > > > > A compiler barrier is enough on strongly-ordered memory platform. As > > it doesn't re-order store, PCI device won't see a stale index value. > > But a weakly-ordered memory needs sfence. > > That is complicated then. We need to define a feature bit and (in the > Linux driver) propagate it to vring_create_virtqueue's weak_barrier > argument. However: > > - if we make it 1 when weak barriers are needed, the device also needs > to nack feature negotiation (not allow setting the FEATURES_OK) if the > bit is not set by the driver. > However, that is not enough. Live > migration assumes that it is okay to migrate a virtual machine from a > source that doesn't support a feature to a destination that supports it. > In this case, it would assume that it is okay to migrate from software > virtio to hardware virtio. This is wrong because the destination would > use weak barriers You can't migrate between systems with different sets of device features right now. > - if we make it 1 when strong barriers are enough, software virtio > devices needs to be updated to expose the bit. This works, including > live migration, but updated drivers will now go slower when run against > an old device that doesn't know the feature bit. > > Maybe bump the PCI revision, so that only the new revision has the bit? > > Thanks, > > Paolo As a first step, if you want to migrate to a HW offloaded solution then you need to enable the feature. It does mean it will go a bit slower when run with software, so it's only good if most systems in your cluster do have the HW offload. I think we can start by getting that working and think about ways to improve down the road. That's the usecase we designed FEATURES_OK for though, so I do think/hope it's enough and we don't need to play with revisions. -- MST
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
On 19/04/2018 14:43, Liang, Cunming wrote: >> 2. Memory barriers. Right now after updating the avail idx, >> virtio does smp_wmb() and then the MMIO write. Normal hardware >> drivers do wmb() which is an sfence. Can a PCI device read bypass >> index write and see a stale index value? > > A compiler barrier is enough on strongly-ordered memory platform. As > it doesn't re-order store, PCI device won't see a stale index value. > But a weakly-ordered memory needs sfence. That is complicated then. We need to define a feature bit and (in the Linux driver) propagate it to vring_create_virtqueue's weak_barrier argument. However: - if we make it 1 when weak barriers are needed, the device also needs to nack feature negotiation (not allow setting the FEATURES_OK) if the bit is not set by the driver. However, that is not enough. Live migration assumes that it is okay to migrate a virtual machine from a source that doesn't support a feature to a destination that supports it. In this case, it would assume that it is okay to migrate from software virtio to hardware virtio. This is wrong because the destination would use weak barriers - if we make it 1 when strong barriers are enough, software virtio devices needs to be updated to expose the bit. This works, including live migration, but updated drivers will now go slower when run against an old device that doesn't know the feature bit. Maybe bump the PCI revision, so that only the new revision has the bit? Thanks, Paolo