Re: [PATCH] ptr_ring: add barriers
On 2017年12月06日 10:53, Michael S. Tsirkin wrote: On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote: On 2017年12月06日 03:29, Michael S. Tsirkin wrote: Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George CherianSuggested-by: Jason Wang Signed-off-by: Michael S. Tsirkin --- George, could you pls report whether this patch fixes the issue for you? This seems to be needed in stable as well. include/linux/ptr_ring.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 37b4bb2..6866df4 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) /* Note: callers invoking this in a loop must use a compiler barrier, * for example cpu_relax(). Callers must hold producer_lock. + * Callers are responsible for making sure pointer that is being queued + * points to a valid data. */ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) { if (unlikely(!r->size) || r->queue[r->producer]) return -ENOSPC; + /* Make sure the pointer we are storing points to a valid data. */ + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ + smp_wmb(); + r->queue[r->producer++] = ptr; if (unlikely(r->producer >= r->size)) r->producer = 0; @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r) if (ptr) __ptr_ring_discard_one(r); + /* Make sure anyone accessing data through the pointer is up to date. */ + /* Pairs with smp_wmb in __ptr_ring_produce. */ + smp_read_barrier_depends(); return ptr; } I was thinking whether or not it's better to move those to the callers. Then we can save lots of barriers in e.g batch consuming. Thanks Batch consumers only do smp_read_barrier_depends which is free on non-alpha. I suggest we do the simple thing for stable and reserve optimizations for later. Right. Acked-by: Jason Wang ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] ptr_ring: add barriers
On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote: > > > On 2017年12月06日 03:29, Michael S. Tsirkin wrote: > > Users of ptr_ring expect that it's safe to give the > > data structure a pointer and have it be available > > to consumers, but that actually requires an smb_wmb > > or a stronger barrier. > > > > In absence of such barriers and on architectures that reorder writes, > > consumer might read an un=initialized value from an skb pointer stored > > in the skb array. This was observed causing crashes. > > > > To fix, add memory barriers. The barrier we use is a wmb, the > > assumption being that producers do not need to read the value so we do > > not need to order these reads. > > > > Reported-by: George Cherian> > Suggested-by: Jason Wang > > Signed-off-by: Michael S. Tsirkin > > --- > > > > George, could you pls report whether this patch fixes > > the issue for you? > > > > This seems to be needed in stable as well. > > > > > > > > > > include/linux/ptr_ring.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h > > index 37b4bb2..6866df4 100644 > > --- a/include/linux/ptr_ring.h > > +++ b/include/linux/ptr_ring.h > > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring > > *r) > > /* Note: callers invoking this in a loop must use a compiler barrier, > >* for example cpu_relax(). Callers must hold producer_lock. > > + * Callers are responsible for making sure pointer that is being queued > > + * points to a valid data. > >*/ > > static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) > > { > > if (unlikely(!r->size) || r->queue[r->producer]) > > return -ENOSPC; > > + /* Make sure the pointer we are storing points to a valid data. */ > > + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ > > + smp_wmb(); > > + > > r->queue[r->producer++] = ptr; > > if (unlikely(r->producer >= r->size)) > > r->producer = 0; > > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring > > *r) > > if (ptr) > > __ptr_ring_discard_one(r); > > + /* Make sure anyone accessing data through the pointer is up to date. */ > > + /* Pairs with smp_wmb in __ptr_ring_produce. */ > > + smp_read_barrier_depends(); > > return ptr; > > } > > I was thinking whether or not it's better to move those to the callers. Then > we can save lots of barriers in e.g batch consuming. > > Thanks Batch consumers only do smp_read_barrier_depends which is free on non-alpha. I suggest we do the simple thing for stable and reserve optimizations for later. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] ptr_ring: add barriers
On 2017年12月06日 03:29, Michael S. Tsirkin wrote: Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George CherianSuggested-by: Jason Wang Signed-off-by: Michael S. Tsirkin --- George, could you pls report whether this patch fixes the issue for you? This seems to be needed in stable as well. include/linux/ptr_ring.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 37b4bb2..6866df4 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) /* Note: callers invoking this in a loop must use a compiler barrier, * for example cpu_relax(). Callers must hold producer_lock. + * Callers are responsible for making sure pointer that is being queued + * points to a valid data. */ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) { if (unlikely(!r->size) || r->queue[r->producer]) return -ENOSPC; + /* Make sure the pointer we are storing points to a valid data. */ + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ + smp_wmb(); + r->queue[r->producer++] = ptr; if (unlikely(r->producer >= r->size)) r->producer = 0; @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r) if (ptr) __ptr_ring_discard_one(r); + /* Make sure anyone accessing data through the pointer is up to date. */ + /* Pairs with smp_wmb in __ptr_ring_produce. */ + smp_read_barrier_depends(); return ptr; } I was thinking whether or not it's better to move those to the callers. Then we can save lots of barriers in e.g batch consuming. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Wed, Dec 06, 2017 at 12:09:36AM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 10:57:00PM +0100, Peter Zijlstra wrote: > > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > > > READ_ONCE is really all over the place (some code literally replaced all > > > memory accesses with READ/WRITE ONCE). > > > > Yeah, so? > > Oh my point was I can't just look for READ_ONCE and go > *that's the pair*. there are too many of these. > At Paul's suggestion I will document the pairing *this read once has a > barrier that is paired with that barrier*. > > > Complain to the compiler people for forcing us into that. > > In some cases when you end up with all accesses > going through read/write once volatile just might better. That is in fact what the jiffies counter does. But you lose READ_ONCE()'s automatic handling of DEC Alpha when you take that approach. > > > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > > > and READ_POINTER for symmetry? > > > > No, the whole point of the exercise was to get away from the fact that > > dependent loads are special. > > It's a pity that dependent stores are still special. We can make READ_ONCE() not be special at zero cost on non-Alpha systems, but both smp_wmb() and smp_store_release() are decidedly not free of added overhead. Thanx, Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: help live migrate SR-IOV devices
On Tue, 5 Dec 2017 14:29:28 -0800 Jakub Kicinskiwrote: > On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote: > > I second Jacob - having a netdev of one device driver enslave a netdev > > of another device driver is an awkward a-symmetric model. > > Regardless of whether they share the same backend device. > > Only I am not sure the Linux Bond is the right choice. > > e.g one may well want to use the virtio device also when the > > pass-through device is available, e.g for multicasts, east-west > > traffic, etc. > > I'm not sure the Linux Bond fits that functionality. > > And, as I hear in this thread, it is hard to make it work out of the > > box. > > So I think the right thing would be to write a new dedicated module > > for this purpose. > > > > > > This part I can sort of agree with. What if we were to look at > > > providing a way to somehow advertise that the two devices were meant > > > to be boded for virtualization purposes? For now lets call it a > > > "virt-bond". Basically we could look at providing a means for virtio > > > and VF drivers to advertise that they want this sort of bond. Then it > > > would just be a matter of providing some sort of side channel to > > > indicate where you want things like multicast/broadcast/east-west > > > traffic to go. > > > > I like this approach. > > +1 on a separate driver, just enslaving devices to virtio may break > existing setups. If people are bonding from user space today, if they > update their kernel it may surprise them how things get auto-mangled. > > Is what Alex is suggesting a separate PV device that says "I would > like to be a bond of those two interfaces"? That would make the HV > intent explicit and kernel decisions more understandable. So far, in my experience it still works. As long as the kernel slaving happens first, it will work. The attempt to bond an already slaved device will fail and no scripts seem to check the error return. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote: > What we do in some code is to comment the pairings, allowing the other > side of the pairing to be easily located. Would that work for you? I would say that that is mandatory for any memory ordering code ;-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 10:57:00PM +0100, Peter Zijlstra wrote: > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > > READ_ONCE is really all over the place (some code literally replaced all > > memory accesses with READ/WRITE ONCE). > > Yeah, so? Oh my point was I can't just look for READ_ONCE and go *that's the pair*. there are too many of these. At Paul's suggestion I will document the pairing *this read once has a barrier that is paired with that barrier*. > Complain to the compiler people for forcing us into that. In some cases when you end up with all accesses going through read/write once volatile just might better. > > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > > and READ_POINTER for symmetry? > > No, the whole point of the exercise was to get away from the fact that > dependent loads are special. It's a pity that dependent stores are still special. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: help live migrate SR-IOV devices
On Tue, Dec 05, 2017 at 01:52:26PM -0800, Jesse Brandeburg wrote: > On Tue, 5 Dec 2017 21:20:07 +0200 > "Michael S. Tsirkin"wrote: > > > On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote: > > > Then we'll have a single solution for both netvsc and virtio (and any > > > other PV device). > > > And we could handle the VF DMA dirt issue agnostically. > > > > For the record, I won't block patches adding this kist to virtio > > on the basis that they must be generic. It's not a lot > > of code, implementation can come first, prettify later. > > Thanks, based on this discussion we're going to work on improving > virtio-net first, but some of Achiad's points are good. I don't believe > it should block the virtio work however. > > In particular I'm really interested in figuring out how we can get to > the point that virtio is able to make or implement some smart decisions > about which NIC to pick for traffic delivery (it's own paravirt path or > the passthorugh device path), if Achiad wants to develop the idea into > some code, I'd be interested to review it. > > > But we do need to have a discussion about how devices are paired. > > I am not sure using just MAC works. E.g. some passthrough > > devices don't give host ability to set the MAC. > > Are these worth worrying about? > > I personally don't think that will be much of a problem, if a > certain device has that issue, can't we just have the virtio-net device > pick up the MAC address of the passthrough device? Then what do you do after you have migrated to another box? The PT device there likely has a different MAC. > As long as they match > things should work OK. It at least is an initial way to do the > configuration that has at least some traction as workable, as proved by > the Microsoft design. Yes - that design just implements what people have been doing for years using bond so of course it's workable. > FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided > MAC address. For VFs you often can program the MAC through the PF, but you typically can't do this for PFs. Or as another example consider nested virt with a VF passed through. PF isn't there within L1 guest so can't be used to program the mac of the VF. Still, we can always start small and require same mac, add other ways to address issues later as we come up with them. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 11:43:41PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote: > > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > > > On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote: > > > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > > > > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > > > > > > > > [ . . . ] > > > > > > > > > > > and this barrier is no longer paired with anything until > > > > > > > you realize there's a dependency barrier within READ_ONCE. > > > > > > > > > > > > > > Barrier pairing was a useful tool to check code validity, > > > > > > > maybe there are other, better tools now. > > > > > > > > > > > > There are quite a few people who say that smp_store_release() is > > > > > > easier for the tools to analyze than is smp_wmb(). My experience > > > > > > with > > > > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe > > > > > > that they are correct. > > > > > > > > > > OK, but smp_store_release is still not paired with anything since we > > > > > rely on READ_ONCE to include the implicit dpendendency barrier. > > > > > > > > Why wouldn't you consider the smp_store_release() to be paired with > > > > the new improved READ_ONCE()? > > > > > > READ_ONCE is really all over the place (some code literally replaced all > > > memory accesses with READ/WRITE ONCE). > > > > > > And I also prefer smp_wmb as it seems to be cheaper on ARM. > > > > > > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > > > and READ_POINTER for symmetry? > > > > What we do in some code is to comment the pairings, allowing the other > > side of the pairing to be easily located. Would that work for you? > > Yes, that's exactly what I did for now. Very good, thank you! Thanx, Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > READ_ONCE is really all over the place (some code literally replaced all > memory accesses with READ/WRITE ONCE). Yeah, so? Complain to the compiler people for forcing us into that. > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > and READ_POINTER for symmetry? No, the whole point of the exercise was to get away from the fact that dependent loads are special. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: help live migrate SR-IOV devices
On Tue, 5 Dec 2017 21:20:07 +0200 "Michael S. Tsirkin"wrote: > On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote: > > Then we'll have a single solution for both netvsc and virtio (and any > > other PV device). > > And we could handle the VF DMA dirt issue agnostically. > > For the record, I won't block patches adding this kist to virtio > on the basis that they must be generic. It's not a lot > of code, implementation can come first, prettify later. Thanks, based on this discussion we're going to work on improving virtio-net first, but some of Achiad's points are good. I don't believe it should block the virtio work however. In particular I'm really interested in figuring out how we can get to the point that virtio is able to make or implement some smart decisions about which NIC to pick for traffic delivery (it's own paravirt path or the passthorugh device path), if Achiad wants to develop the idea into some code, I'd be interested to review it. > But we do need to have a discussion about how devices are paired. > I am not sure using just MAC works. E.g. some passthrough > devices don't give host ability to set the MAC. > Are these worth worrying about? I personally don't think that will be much of a problem, if a certain device has that issue, can't we just have the virtio-net device pick up the MAC address of the passthrough device? As long as they match things should work OK. It at least is an initial way to do the configuration that has at least some traction as workable, as proved by the Microsoft design. FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided MAC address. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote: > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > > On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote: > > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > > > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > > > > > > [ . . . ] > > > > > > > > > and this barrier is no longer paired with anything until > > > > > > you realize there's a dependency barrier within READ_ONCE. > > > > > > > > > > > > Barrier pairing was a useful tool to check code validity, > > > > > > maybe there are other, better tools now. > > > > > > > > > > There are quite a few people who say that smp_store_release() is > > > > > easier for the tools to analyze than is smp_wmb(). My experience with > > > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe > > > > > that they are correct. > > > > > > > > OK, but smp_store_release is still not paired with anything since we > > > > rely on READ_ONCE to include the implicit dpendendency barrier. > > > > > > Why wouldn't you consider the smp_store_release() to be paired with > > > the new improved READ_ONCE()? > > > > READ_ONCE is really all over the place (some code literally replaced all > > memory accesses with READ/WRITE ONCE). > > > > And I also prefer smp_wmb as it seems to be cheaper on ARM. > > > > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > > and READ_POINTER for symmetry? > > What we do in some code is to comment the pairings, allowing the other > side of the pairing to be easily located. Would that work for you? > > Thanx, Paul Yes, that's exactly what I did for now. Thanks! -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 10:17:35PM +0100, Peter Zijlstra wrote: > On Tue, Dec 05, 2017 at 10:28:38PM +0200, Michael S. Tsirkin wrote: > > On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote: > > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > > > > WRITE_ONCE(obj->val, 1); > > > > > > smp_wmb(); > > > > > > WRITE_ONCE(*foo, obj); > > > > > > > > > > I believe Peter was instead suggesting: > > > > > > > > > > WRITE_ONCE(obj->val, 1); > > > > > smp_store_release(foo, obj); > > > > > > > > Isn't that more expensive though? > > > > > > Depends on the architecture. The only architecture where it is more > > > expensive and people actually still care about is ARM I think. > > > > Right. Why should I use the more expensive smp_store_release then? > > Because it makes more sense. Memory ordering is hard enough, don't make > it harder still if you don't have to. I suspect I have to - ptr_ring is a very low level construct used by netowrking on data path so making it a bit more complicated for a bit of performance is probably justified. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote: > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > > > > [ . . . ] > > > > > > > and this barrier is no longer paired with anything until > > > > > you realize there's a dependency barrier within READ_ONCE. > > > > > > > > > > Barrier pairing was a useful tool to check code validity, > > > > > maybe there are other, better tools now. > > > > > > > > There are quite a few people who say that smp_store_release() is > > > > easier for the tools to analyze than is smp_wmb(). My experience with > > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe > > > > that they are correct. > > > > > > OK, but smp_store_release is still not paired with anything since we > > > rely on READ_ONCE to include the implicit dpendendency barrier. > > > > Why wouldn't you consider the smp_store_release() to be paired with > > the new improved READ_ONCE()? > > READ_ONCE is really all over the place (some code literally replaced all > memory accesses with READ/WRITE ONCE). > > And I also prefer smp_wmb as it seems to be cheaper on ARM. > > Would an API like WRITE_POINTER()/smp_store_pointer make sense, > and READ_POINTER for symmetry? What we do in some code is to comment the pairings, allowing the other side of the pairing to be easily located. Would that work for you? Thanx, Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote: > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > > [ . . . ] > > > > > and this barrier is no longer paired with anything until > > > > you realize there's a dependency barrier within READ_ONCE. > > > > > > > > Barrier pairing was a useful tool to check code validity, > > > > maybe there are other, better tools now. > > > > > > There are quite a few people who say that smp_store_release() is > > > easier for the tools to analyze than is smp_wmb(). My experience with > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe > > > that they are correct. > > > > OK, but smp_store_release is still not paired with anything since we > > rely on READ_ONCE to include the implicit dpendendency barrier. > > Why wouldn't you consider the smp_store_release() to be paired with > the new improved READ_ONCE()? > > Thanx, Paul READ_ONCE is really all over the place (some code literally replaced all memory accesses with READ/WRITE ONCE). And I also prefer smp_wmb as it seems to be cheaper on ARM. Would an API like WRITE_POINTER()/smp_store_pointer make sense, and READ_POINTER for symmetry? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 10:28:38PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote: > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > > > WRITE_ONCE(obj->val, 1); > > > > > smp_wmb(); > > > > > WRITE_ONCE(*foo, obj); > > > > > > > > I believe Peter was instead suggesting: > > > > > > > > WRITE_ONCE(obj->val, 1); > > > > smp_store_release(foo, obj); > > > > > > Isn't that more expensive though? > > > > Depends on the architecture. The only architecture where it is more > > expensive and people actually still care about is ARM I think. > > Right. Why should I use the more expensive smp_store_release then? Because it makes more sense. Memory ordering is hard enough, don't make it harder still if you don't have to. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote: > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > > WRITE_ONCE(obj->val, 1); > > > > smp_wmb(); > > > > WRITE_ONCE(*foo, obj); > > > > > > I believe Peter was instead suggesting: > > > > > > WRITE_ONCE(obj->val, 1); > > > smp_store_release(foo, obj); > > > > Isn't that more expensive though? > > Depends on the architecture. The only architecture where it is more > expensive and people actually still care about is ARM I think. Right. Why should I use the more expensive smp_store_release then? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: [ . . . ] > > > and this barrier is no longer paired with anything until > > > you realize there's a dependency barrier within READ_ONCE. > > > > > > Barrier pairing was a useful tool to check code validity, > > > maybe there are other, better tools now. > > > > There are quite a few people who say that smp_store_release() is > > easier for the tools to analyze than is smp_wmb(). My experience with > > smp_read_barrier_depends() and rcu_dereference() leads me to believe > > that they are correct. > > OK, but smp_store_release is still not paired with anything since we > rely on READ_ONCE to include the implicit dpendendency barrier. Why wouldn't you consider the smp_store_release() to be paired with the new improved READ_ONCE()? Thanx, Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote: > > > WRITE_ONCE(obj->val, 1); > > > smp_wmb(); > > > WRITE_ONCE(*foo, obj); > > > > I believe Peter was instead suggesting: > > > > WRITE_ONCE(obj->val, 1); > > smp_store_release(foo, obj); > > Isn't that more expensive though? Depends on the architecture. The only architecture where it is more expensive and people actually still care about is ARM I think. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote: > > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote: > > > > > I don't see WRITE_ONCE inserting any barriers, release or > > > write. > > > > Correct, never claimed there was. > > > > Just saying that: > > > > obj = READ_ONCE(*foo); > > val = READ_ONCE(obj->val); > > > > Never needs a barrier (except on Alpha and we want to make that go > > away). Simply because a CPU needs to complete the load of @obj before it > > can compute the address >val. Thus the second load _must_ come > > after the first load and we get LOAD-LOAD ordering. > > > > Alpha messing that up is a royal pain, and Alpha not being an > > active/living architecture is just not worth the pain of keeping this in > > the generic model. > > > > Right. What I am saying is that for writes you need > > WRITE_ONCE(obj->val, 1); > smp_wmb(); > WRITE_ONCE(*foo, obj); You really should use smp_store_release() here instead of the smp_wmb(). But yes. > and this barrier is no longer paired with anything until > you realize there's a dependency barrier within READ_ONCE. No, there isn't. read_dependecy barriers are no more. They don't exist. They never did, except on Alpha anyway. There were a ton of sites that relied on this but never had the smp_read_barrier_depends() annotation, similarly there were quite a few sites that had the barrier but nobody could explain wtf for. What these patches do is return the sane rule that dependent loads are ordered. And like all memory ordering; it should come with comments. Any piece of code that relies on memory ordering and doesn't have big fat comments that explain the required ordering are broken per definition. Maybe not now, but they will be after a few years because someone changed it and didn't know. > Barrier pairing was a useful tool to check code validity, > maybe there are other, better tools now. Same is true for the closely related control dependency, you can pair against those just fine but they don't have an explicit barrier annotation. There are no tools, use brain. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote: > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > > On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote: > > > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote: > > > > > > > I don't see WRITE_ONCE inserting any barriers, release or > > > > write. > > > > > > Correct, never claimed there was. > > > > > > Just saying that: > > > > > > obj = READ_ONCE(*foo); > > > val = READ_ONCE(obj->val); > > > > > > Never needs a barrier (except on Alpha and we want to make that go > > > away). Simply because a CPU needs to complete the load of @obj before it > > > can compute the address >val. Thus the second load _must_ come > > > after the first load and we get LOAD-LOAD ordering. > > > > > > Alpha messing that up is a royal pain, and Alpha not being an > > > active/living architecture is just not worth the pain of keeping this in > > > the generic model. > > > > > > > Right. What I am saying is that for writes you need > > > > WRITE_ONCE(obj->val, 1); > > smp_wmb(); > > WRITE_ONCE(*foo, obj); > > I believe Peter was instead suggesting: > > WRITE_ONCE(obj->val, 1); > smp_store_release(foo, obj); Isn't that more expensive though? > > and this barrier is no longer paired with anything until > > you realize there's a dependency barrier within READ_ONCE. > > > > Barrier pairing was a useful tool to check code validity, > > maybe there are other, better tools now. > > There are quite a few people who say that smp_store_release() is > easier for the tools to analyze than is smp_wmb(). My experience with > smp_read_barrier_depends() and rcu_dereference() leads me to believe > that they are correct. > > Thanx, Paul OK, but smp_store_release is still not paired with anything since we rely on READ_ONCE to include the implicit dpendendency barrier. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote: > On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote: > > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote: > > > > > I don't see WRITE_ONCE inserting any barriers, release or > > > write. > > > > Correct, never claimed there was. > > > > Just saying that: > > > > obj = READ_ONCE(*foo); > > val = READ_ONCE(obj->val); > > > > Never needs a barrier (except on Alpha and we want to make that go > > away). Simply because a CPU needs to complete the load of @obj before it > > can compute the address >val. Thus the second load _must_ come > > after the first load and we get LOAD-LOAD ordering. > > > > Alpha messing that up is a royal pain, and Alpha not being an > > active/living architecture is just not worth the pain of keeping this in > > the generic model. > > > > Right. What I am saying is that for writes you need > > WRITE_ONCE(obj->val, 1); > smp_wmb(); > WRITE_ONCE(*foo, obj); I believe Peter was instead suggesting: WRITE_ONCE(obj->val, 1); smp_store_release(foo, obj); > and this barrier is no longer paired with anything until > you realize there's a dependency barrier within READ_ONCE. > > Barrier pairing was a useful tool to check code validity, > maybe there are other, better tools now. There are quite a few people who say that smp_store_release() is easier for the tools to analyze than is smp_wmb(). My experience with smp_read_barrier_depends() and rcu_dereference() leads me to believe that they are correct. Thanx, Paul ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] ptr_ring: add barriers
Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George CherianSuggested-by: Jason Wang Signed-off-by: Michael S. Tsirkin --- George, could you pls report whether this patch fixes the issue for you? This seems to be needed in stable as well. include/linux/ptr_ring.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 37b4bb2..6866df4 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) /* Note: callers invoking this in a loop must use a compiler barrier, * for example cpu_relax(). Callers must hold producer_lock. + * Callers are responsible for making sure pointer that is being queued + * points to a valid data. */ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) { if (unlikely(!r->size) || r->queue[r->producer]) return -ENOSPC; + /* Make sure the pointer we are storing points to a valid data. */ + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ + smp_wmb(); + r->queue[r->producer++] = ptr; if (unlikely(r->producer >= r->size)) r->producer = 0; @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r) if (ptr) __ptr_ring_discard_one(r); + /* Make sure anyone accessing data through the pointer is up to date. */ + /* Pairs with smp_wmb in __ptr_ring_produce. */ + smp_read_barrier_depends(); return ptr; } -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote: > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote: > > > I don't see WRITE_ONCE inserting any barriers, release or > > write. > > Correct, never claimed there was. > > Just saying that: > > obj = READ_ONCE(*foo); > val = READ_ONCE(obj->val); > > Never needs a barrier (except on Alpha and we want to make that go > away). Simply because a CPU needs to complete the load of @obj before it > can compute the address >val. Thus the second load _must_ come > after the first load and we get LOAD-LOAD ordering. > > Alpha messing that up is a royal pain, and Alpha not being an > active/living architecture is just not worth the pain of keeping this in > the generic model. > Right. What I am saying is that for writes you need WRITE_ONCE(obj->val, 1); smp_wmb(); WRITE_ONCE(*foo, obj); and this barrier is no longer paired with anything until you realize there's a dependency barrier within READ_ONCE. Barrier pairing was a useful tool to check code validity, maybe there are other, better tools now. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: help live migrate SR-IOV devices
On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote: > Then we'll have a single solution for both netvsc and virtio (and any > other PV device). > And we could handle the VF DMA dirt issue agnostically. For the record, I won't block patches adding this kist to virtio on the basis that they must be generic. It's not a lot of code, implementation can come first, prettify later. But we do need to have a discussion about how devices are paired. I am not sure using just MAC works. E.g. some passthrough devices don't give host ability to set the MAC. Are these worth worrying about? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote: > I don't see WRITE_ONCE inserting any barriers, release or > write. Correct, never claimed there was. Just saying that: obj = READ_ONCE(*foo); val = READ_ONCE(obj->val); Never needs a barrier (except on Alpha and we want to make that go away). Simply because a CPU needs to complete the load of @obj before it can compute the address >val. Thus the second load _must_ come after the first load and we get LOAD-LOAD ordering. Alpha messing that up is a royal pain, and Alpha not being an active/living architecture is just not worth the pain of keeping this in the generic model. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 07:39:46PM +0100, Peter Zijlstra wrote: > On Tue, Dec 05, 2017 at 08:31:20PM +0200, Michael S. Tsirkin wrote: > > > Apropos, READ_ONCE is now asymmetrical with WRITE_ONCE. > > > > I can read a pointer with READ_ONCE and be sure the value > > is sane, but only if I also remember to put in smp_wmb before > > WRITE_ONCE. Otherwise the pointer is ok but no guarantees > > about the data pointed to. > > That was already the case on everything except Alpha. And the canonical > match do the data dependency is store_release, not wmb. Oh, interesting static __always_inline void __write_once_size(volatile void *p, void *res, int size) { switch (size) { case 1: *(volatile __u8 *)p = *(__u8 *)res; break; case 2: *(volatile __u16 *)p = *(__u16 *)res; break; case 4: *(volatile __u32 *)p = *(__u32 *)res; break; case 8: *(volatile __u64 *)p = *(__u64 *)res; break; default: barrier(); __builtin_memcpy((void *)p, (const void *)res, size); barrier(); } } #define WRITE_ONCE(x, val) \ ({ \ union { typeof(x) __val; char __c[1]; } __u = \ { .__val = (__force typeof(x)) (val) }; \ __write_once_size(&(x), __u.__c, sizeof(x));\ __u.__val; \ }) I don't see WRITE_ONCE inserting any barriers, release or write. So it seems that on an architecture where writes can be reordered, if I do *pointer = 0xa; WRITE_ONCE(array[x], pointer); array write might bypass the pointer write, and readers will read a stale value. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()
On Tue, Dec 05, 2017 at 08:31:20PM +0200, Michael S. Tsirkin wrote: > Apropos, READ_ONCE is now asymmetrical with WRITE_ONCE. > > I can read a pointer with READ_ONCE and be sure the value > is sane, but only if I also remember to put in smp_wmb before > WRITE_ONCE. Otherwise the pointer is ok but no guarantees > about the data pointed to. That was already the case on everything except Alpha. And the canonical match do the data dependency is store_release, not wmb. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 1/2] virtio_mmio: add cleanup for virtio_mmio_probe
As mentioned at drivers/base/core.c: /* * NOTE: _Never_ directly free @dev after calling this function, even * if it returned an error! Always use put_device() to give up the * reference initialized in this function instead. */ Normal we do cleanup for @vm_dev by contianer_of(@dev), but in this case we need release @mem resource from @pdev and vm_dev->base. It make @pdev->vm_dev.dev.release() too complicated, so put_device just put the reference of register_virtio_device->device_register->device_initialize and release all resource in virtio_mmio_probe. Signed-off-by: weiping zhang--- drivers/virtio/virtio_mmio.c | 36 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 74dc717..f984510 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -513,8 +513,10 @@ static int virtio_mmio_probe(struct platform_device *pdev) return -EBUSY; vm_dev = devm_kzalloc(>dev, sizeof(*vm_dev), GFP_KERNEL); - if (!vm_dev) - return -ENOMEM; + if (!vm_dev) { + rc = -ENOMEM; + goto free_mem; + } vm_dev->vdev.dev.parent = >dev; vm_dev->vdev.dev.release = virtio_mmio_release_dev_empty; @@ -524,14 +526,17 @@ static int virtio_mmio_probe(struct platform_device *pdev) spin_lock_init(_dev->lock); vm_dev->base = devm_ioremap(>dev, mem->start, resource_size(mem)); - if (vm_dev->base == NULL) - return -EFAULT; + if (vm_dev->base == NULL) { + rc = -EFAULT; + goto free_vmdev; + } /* Check magic value */ magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE); if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) { dev_warn(>dev, "Wrong magic value 0x%08lx!\n", magic); - return -ENODEV; + rc = -ENODEV; + goto unmap; } /* Check device version */ @@ -539,7 +544,8 @@ static int virtio_mmio_probe(struct platform_device *pdev) if (vm_dev->version < 1 || vm_dev->version > 2) { dev_err(>dev, "Version %ld not supported!\n", vm_dev->version); - return -ENXIO; + rc = -ENXIO; + goto unmap; } vm_dev->vdev.id.device = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_ID); @@ -548,7 +554,8 @@ static int virtio_mmio_probe(struct platform_device *pdev) * virtio-mmio device with an ID 0 is a (dummy) placeholder * with no function. End probing now with no error reported. */ - return -ENODEV; + rc = -ENODEV; + goto unmap; } vm_dev->vdev.id.vendor = readl(vm_dev->base + VIRTIO_MMIO_VENDOR_ID); @@ -573,7 +580,20 @@ static int virtio_mmio_probe(struct platform_device *pdev) platform_set_drvdata(pdev, vm_dev); - return register_virtio_device(_dev->vdev); + rc = register_virtio_device(_dev->vdev); + if (rc) + goto put_dev; + return 0; +put_dev: + put_device(_dev->vdev.dev); +unmap: + iounmap(vm_dev->base); +free_mem: + devm_release_mem_region(>dev, mem->start, + resource_size(mem)); +free_vmdev: + devm_kfree(>dev, vm_dev); + return rc; } static int virtio_mmio_remove(struct platform_device *pdev) -- 2.9.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 0/2] Add cleanup for virtio_mmio driver
this patchset try to add cleanup for virtio_mmio driver, include virtio_mmio_probe and virtio_mmio_remove weiping zhang (2): virtio_mmio: add cleanup for virtio_mmio_probe virtio_mmio: add cleanup for virtio_mmio_remove drivers/virtio/virtio_mmio.c | 43 +++ 1 file changed, 35 insertions(+), 8 deletions(-) -- 2.9.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 2/2] virtio_mmio: add cleanup for virtio_mmio_remove
cleanup all resource allocated by virtio_mmio_probe. Signed-off-by: weiping zhang--- drivers/virtio/virtio_mmio.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index f984510..5e2ca34 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -599,8 +599,15 @@ static int virtio_mmio_probe(struct platform_device *pdev) static int virtio_mmio_remove(struct platform_device *pdev) { struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev); + struct resource *mem; unregister_virtio_device(_dev->vdev); + iounmap(vm_dev->base); + mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (mem) + devm_release_mem_region(>dev, mem->start, + resource_size(mem)); + devm_kfree(>dev, vm_dev); return 0; } -- 2.9.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: help live migrate SR-IOV devices
On 4 December 2017 at 18:30, Alexander Duyckwrote: > On Mon, Dec 4, 2017 at 1:51 AM, achiad shochat > wrote: >> On 3 December 2017 at 19:35, Stephen Hemminger >> wrote: >>> On Sun, 3 Dec 2017 11:14:37 +0200 >>> achiad shochat wrote: >>> On 3 December 2017 at 07:05, Michael S. Tsirkin wrote: > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote: >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote: >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote: >> > > Re. problem #2: >> > > Indeed the best way to address it seems to be to enslave the VF >> > > driver >> > > netdev under a persistent anchor netdev. >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF >> > > netdev to work in conjunction. >> > > And it's indeed desired that this enslavement logic work out-of-the >> > > box. >> > > But in case of PV+VF some configurable policies must be in place >> > > (and >> > > they'd better be generic rather than differ per PV technology). >> > > For example - based on which characteristics should the PV+VF >> > > coupling >> > > be done? netvsc uses MAC address, but that might not always be the >> > > desire. >> > >> > It's a policy but not guest userspace policy. >> > >> > The hypervisor certainly knows. >> > >> > Are you concerned that someone might want to create two devices with >> > the >> > same MAC for an unrelated reason? If so, hypervisor could easily set >> > a >> > flag in the virtio device to say "this is a backup, use MAC to find >> > another device". >> >> This is something I was going to suggest: a flag or other configuration >> on >> the virtio device to help control how this new feature is used. I can >> imagine this might be useful to control from either the hypervisor side >> or >> the VM side. >> >> The hypervisor might want to (1) disable it (force it off), (2) enable >> it >> for VM choice, or (3) force it on for the VM. In case (2), the VM >> might be >> able to chose whether it wants to make use of the feature, or stick >> with the >> bonding solution. >> >> Either way, the kernel is making a feature available, and the user (VM >> or >> hypervisor) is able to control it by selecting the feature based on the >> policy desired. >> >> sln > > I'm not sure what's the feature that is available here. > > I saw this as a flag that says "this device shares backend with another > network device which can be found using MAC, and that backend should be > preferred". kernel then forces configuration which uses that other > backend - as long as it exists. > > However, please Cc virtio-dev mailing list if we are doing this since > this is a spec extension. > > -- > MST Can someone please explain why assume a virtio device is there at all?? I specified a case where there isn't any. > > Migrating without any virtual device is going to be extremely > challenging, especially in any kind of virtualization setup where the > hosts are not homogeneous. By providing a virtio interface you can > guarantee that at least 1 network interface is available on any given > host, and then fail over to that as the least common denominator for > any migration. > I am not sure why you think it is going to be so challenging. Are you referring to preserving the pass-through device driver state (RX/TX rings)? I do not think we should preserve them, we can simply teardown the whole VF netdev (since we have a parent netdev as application interface). The downtime impact will be negligible. I second Jacob - having a netdev of one device driver enslave a netdev of another device driver is an awkward a-symmetric model. Regardless of whether they share the same backend device. Only I am not sure the Linux Bond is the right choice. e.g one may well want to use the virtio device also when the pass-through device is available, e.g for multicasts, east-west traffic, etc. I'm not sure the Linux Bond fits that functionality. And, as I hear in this thread, it is hard to make it work out of the box. So I think the right thing would be to write a new dedicated module for this purpose. > > This part I can sort of agree with. What if we were to look at > providing a way to somehow advertise that the two devices were meant > to be boded for virtualization purposes? For now lets call it a > "virt-bond". Basically we could look at providing a means for virtio > and VF drivers to advertise that they want this sort of bond. Then it > would just be a matter of