Re: [PATCH] ptr_ring: add barriers

2017-12-05 Thread Jason Wang



On 2017年12月06日 10:53, Michael S. Tsirkin wrote:

On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote:


On 2017年12月06日 03:29, Michael S. Tsirkin wrote:

Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array.  This was observed causing crashes.

To fix, add memory barriers.  The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian 
Suggested-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
---

George, could you pls report whether this patch fixes
the issue for you?

This seems to be needed in stable as well.




   include/linux/ptr_ring.h | 9 +
   1 file changed, 9 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 37b4bb2..6866df4 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
   /* Note: callers invoking this in a loop must use a compiler barrier,
* for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
*/
   static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
   {
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;
+   /* Make sure the pointer we are storing points to a valid data. */
+   /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+   smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
@@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
if (ptr)
__ptr_ring_discard_one(r);
+   /* Make sure anyone accessing data through the pointer is up to date. */
+   /* Pairs with smp_wmb in __ptr_ring_produce. */
+   smp_read_barrier_depends();
return ptr;
   }

I was thinking whether or not it's better to move those to the callers. Then
we can save lots of barriers in e.g batch consuming.

Thanks

Batch consumers only do smp_read_barrier_depends which is free on
non-alpha. I suggest we do the simple thing for stable and reserve
optimizations for later.



Right.

Acked-by: Jason Wang 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] ptr_ring: add barriers

2017-12-05 Thread Michael S. Tsirkin
On Wed, Dec 06, 2017 at 10:31:39AM +0800, Jason Wang wrote:
> 
> 
> On 2017年12月06日 03:29, Michael S. Tsirkin wrote:
> > Users of ptr_ring expect that it's safe to give the
> > data structure a pointer and have it be available
> > to consumers, but that actually requires an smb_wmb
> > or a stronger barrier.
> > 
> > In absence of such barriers and on architectures that reorder writes,
> > consumer might read an un=initialized value from an skb pointer stored
> > in the skb array.  This was observed causing crashes.
> > 
> > To fix, add memory barriers.  The barrier we use is a wmb, the
> > assumption being that producers do not need to read the value so we do
> > not need to order these reads.
> > 
> > Reported-by: George Cherian 
> > Suggested-by: Jason Wang 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> > 
> > George, could you pls report whether this patch fixes
> > the issue for you?
> > 
> > This seems to be needed in stable as well.
> > 
> > 
> > 
> > 
> >   include/linux/ptr_ring.h | 9 +
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 37b4bb2..6866df4 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring 
> > *r)
> >   /* Note: callers invoking this in a loop must use a compiler barrier,
> >* for example cpu_relax(). Callers must hold producer_lock.
> > + * Callers are responsible for making sure pointer that is being queued
> > + * points to a valid data.
> >*/
> >   static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> >   {
> > if (unlikely(!r->size) || r->queue[r->producer])
> > return -ENOSPC;
> > +   /* Make sure the pointer we are storing points to a valid data. */
> > +   /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> > +   smp_wmb();
> > +
> > r->queue[r->producer++] = ptr;
> > if (unlikely(r->producer >= r->size))
> > r->producer = 0;
> > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring 
> > *r)
> > if (ptr)
> > __ptr_ring_discard_one(r);
> > +   /* Make sure anyone accessing data through the pointer is up to date. */
> > +   /* Pairs with smp_wmb in __ptr_ring_produce. */
> > +   smp_read_barrier_depends();
> > return ptr;
> >   }
> 
> I was thinking whether or not it's better to move those to the callers. Then
> we can save lots of barriers in e.g batch consuming.
> 
> Thanks

Batch consumers only do smp_read_barrier_depends which is free on
non-alpha. I suggest we do the simple thing for stable and reserve
optimizations for later.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] ptr_ring: add barriers

2017-12-05 Thread Jason Wang



On 2017年12月06日 03:29, Michael S. Tsirkin wrote:

Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array.  This was observed causing crashes.

To fix, add memory barriers.  The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian 
Suggested-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
---

George, could you pls report whether this patch fixes
the issue for you?

This seems to be needed in stable as well.




  include/linux/ptr_ring.h | 9 +
  1 file changed, 9 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 37b4bb2..6866df4 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
  
  /* Note: callers invoking this in a loop must use a compiler barrier,

   * for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
   */
  static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
  {
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;
  
+	/* Make sure the pointer we are storing points to a valid data. */

+   /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+   smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
@@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
if (ptr)
__ptr_ring_discard_one(r);
  
+	/* Make sure anyone accessing data through the pointer is up to date. */

+   /* Pairs with smp_wmb in __ptr_ring_produce. */
+   smp_read_barrier_depends();
return ptr;
  }
  


I was thinking whether or not it's better to move those to the callers. 
Then we can save lots of barriers in e.g batch consuming.


Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Paul E. McKenney
On Wed, Dec 06, 2017 at 12:09:36AM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 10:57:00PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> > > READ_ONCE is really all over the place (some code literally replaced all
> > > memory accesses with READ/WRITE ONCE).
> > 
> > Yeah, so?
> 
> Oh my point was I can't just look for READ_ONCE and go
> *that's the pair*. there are too many of these.
> At Paul's suggestion I will document the pairing *this read once has a
> barrier that is paired with that barrier*.
> 
> > Complain to the compiler people for forcing us into that.
> 
> In some cases when you end up with all accesses
> going through read/write once volatile just might better.

That is in fact what the jiffies counter does.  But you lose READ_ONCE()'s
automatic handling of DEC Alpha when you take that approach.

> > > Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> > > and READ_POINTER for symmetry?
> > 
> > No, the whole point of the exercise was to get away from the fact that
> > dependent loads are special.
> 
> It's a pity that dependent stores are still special.

We can make READ_ONCE() not be special at zero cost on non-Alpha
systems, but both smp_wmb() and smp_store_release() are decidedly
not free of added overhead.

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread Stephen Hemminger
On Tue, 5 Dec 2017 14:29:28 -0800
Jakub Kicinski  wrote:

> On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote:
> >  I second Jacob - having a netdev of one device driver enslave a netdev
> >  of another device driver is an awkward a-symmetric model.
> >  Regardless of whether they share the same backend device.
> >  Only I am not sure the Linux Bond is the right choice.
> >  e.g one may well want to use the virtio device also when the
> >  pass-through device is available, e.g for multicasts, east-west
> >  traffic, etc.
> >  I'm not sure the Linux Bond fits that functionality.
> >  And, as I hear in this thread, it is hard to make it work out of the 
> >  box.
> >  So I think the right thing would be to write a new dedicated module
> >  for this purpose.
> > >
> > > This part I can sort of agree with. What if we were to look at
> > > providing a way to somehow advertise that the two devices were meant
> > > to be boded for virtualization purposes? For now lets call it a
> > > "virt-bond". Basically we could look at providing a means for virtio
> > > and VF drivers to advertise that they want this sort of bond. Then it
> > > would just be a matter of providing some sort of side channel to
> > > indicate where you want things like multicast/broadcast/east-west
> > > traffic to go.  
> > 
> > I like this approach.  
> 
> +1 on a separate driver, just enslaving devices to virtio may break
> existing setups.  If people are bonding from user space today, if they
> update their kernel it may surprise them how things get auto-mangled.
> 
> Is what Alex is suggesting a separate PV device that says "I would
> like to be a bond of those two interfaces"?  That would make the HV
> intent explicit and kernel decisions more understandable.

So far, in my experience it still works.
As long as the kernel slaving happens first, it will work.
The attempt to bond an already slaved device will fail and no scripts seem
to check the error return.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote:
> What we do in some code is to comment the pairings, allowing the other
> side of the pairing to be easily located.  Would that work for you?

I would say that that is mandatory for any memory ordering code ;-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 10:57:00PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> > READ_ONCE is really all over the place (some code literally replaced all
> > memory accesses with READ/WRITE ONCE).
> 
> Yeah, so?

Oh my point was I can't just look for READ_ONCE and go
*that's the pair*. there are too many of these.
At Paul's suggestion I will document the pairing *this read once has a
barrier that is paired with that barrier*.

> Complain to the compiler people for forcing us into that.

In some cases when you end up with all accesses
going through read/write once volatile just might better.

> > Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> > and READ_POINTER for symmetry?
> 
> No, the whole point of the exercise was to get away from the fact that
> dependent loads are special.

It's a pity that dependent stores are still special.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 01:52:26PM -0800, Jesse Brandeburg wrote:
> On Tue, 5 Dec 2017 21:20:07 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> > > Then we'll have a single solution for both netvsc and virtio (and any
> > > other PV device).
> > > And we could handle the VF DMA dirt issue agnostically.  
> > 
> > For the record, I won't block patches adding this kist to virtio
> > on the basis that they must be generic. It's not a lot
> > of code, implementation can come first, prettify later.
> 
> Thanks, based on this discussion we're going to work on improving
> virtio-net first, but some of Achiad's points are good.  I don't believe
> it should block the virtio work however.
> 
> In particular I'm really interested in figuring out how we can get to
> the point that virtio is able to make or implement some smart decisions
> about which NIC to pick for traffic delivery (it's own paravirt path or
> the passthorugh device path), if Achiad wants to develop the idea into
> some code, I'd be interested to review it.
> 
> > But we do need to have a discussion about how devices are paired.
> > I am not sure using just MAC works. E.g. some passthrough
> > devices don't give host ability to set the MAC.
> > Are these worth worrying about?
> 
> I personally don't think that will be much of a problem, if a
> certain device has that issue, can't we just have the virtio-net device
> pick up the MAC address of the passthrough device?

Then what do you do after you have migrated to another box?
The PT device there likely has a different MAC.

> As long as they match
> things should work OK. It at least is an initial way to do the
> configuration that has at least some traction as workable, as proved by
> the Microsoft design.

Yes - that design just implements what people have been doing for years
using bond so of course it's workable.

> FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided
> MAC address.

For VFs you often can program the MAC through the PF, but you typically
can't do this for PFs. Or as another example consider nested virt with a
VF passed through.  PF isn't there within L1 guest so can't be used to
program the mac of the VF.

Still, we can always start small and require same mac, add other ways
to address issues later as we come up with them.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Paul E. McKenney
On Tue, Dec 05, 2017 at 11:43:41PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> > > > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > > and this barrier is no longer paired with anything until
> > > > > > > you realize there's a dependency barrier within READ_ONCE.
> > > > > > > 
> > > > > > > Barrier pairing was a useful tool to check code validity,
> > > > > > > maybe there are other, better tools now.
> > > > > > 
> > > > > > There are quite a few people who say that smp_store_release() is
> > > > > > easier for the tools to analyze than is smp_wmb().  My experience 
> > > > > > with
> > > > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe
> > > > > > that they are correct.
> > > > > 
> > > > > OK, but smp_store_release is still not paired with anything since we
> > > > > rely on READ_ONCE to include the implicit dpendendency barrier.
> > > > 
> > > > Why wouldn't you consider the smp_store_release() to be paired with
> > > > the new improved READ_ONCE()?
> > > 
> > > READ_ONCE is really all over the place (some code literally replaced all
> > > memory accesses with READ/WRITE ONCE).
> > > 
> > > And I also prefer smp_wmb as it seems to be cheaper on ARM.
> > > 
> > > Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> > > and READ_POINTER for symmetry?
> > 
> > What we do in some code is to comment the pairings, allowing the other
> > side of the pairing to be easily located.  Would that work for you?
> 
> Yes, that's exactly what I did for now.

Very good, thank you!

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> READ_ONCE is really all over the place (some code literally replaced all
> memory accesses with READ/WRITE ONCE).

Yeah, so? Complain to the compiler people for forcing us into that.

> Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> and READ_POINTER for symmetry?

No, the whole point of the exercise was to get away from the fact that
dependent loads are special.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread Jesse Brandeburg
On Tue, 5 Dec 2017 21:20:07 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> > Then we'll have a single solution for both netvsc and virtio (and any
> > other PV device).
> > And we could handle the VF DMA dirt issue agnostically.  
> 
> For the record, I won't block patches adding this kist to virtio
> on the basis that they must be generic. It's not a lot
> of code, implementation can come first, prettify later.

Thanks, based on this discussion we're going to work on improving
virtio-net first, but some of Achiad's points are good.  I don't believe
it should block the virtio work however.

In particular I'm really interested in figuring out how we can get to
the point that virtio is able to make or implement some smart decisions
about which NIC to pick for traffic delivery (it's own paravirt path or
the passthorugh device path), if Achiad wants to develop the idea into
some code, I'd be interested to review it.

> But we do need to have a discussion about how devices are paired.
> I am not sure using just MAC works. E.g. some passthrough
> devices don't give host ability to set the MAC.
> Are these worth worrying about?

I personally don't think that will be much of a problem, if a
certain device has that issue, can't we just have the virtio-net device
pick up the MAC address of the passthrough device? As long as they match
things should work OK. It at least is an initial way to do the
configuration that has at least some traction as workable, as proved by
the Microsoft design.

FWIW, the Intel SR-IOV devices all accept a hypervisor/host provided
MAC address.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 01:36:44PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > and this barrier is no longer paired with anything until
> > > > > > you realize there's a dependency barrier within READ_ONCE.
> > > > > > 
> > > > > > Barrier pairing was a useful tool to check code validity,
> > > > > > maybe there are other, better tools now.
> > > > > 
> > > > > There are quite a few people who say that smp_store_release() is
> > > > > easier for the tools to analyze than is smp_wmb().  My experience with
> > > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe
> > > > > that they are correct.
> > > > 
> > > > OK, but smp_store_release is still not paired with anything since we
> > > > rely on READ_ONCE to include the implicit dpendendency barrier.
> > > 
> > > Why wouldn't you consider the smp_store_release() to be paired with
> > > the new improved READ_ONCE()?
> > 
> > READ_ONCE is really all over the place (some code literally replaced all
> > memory accesses with READ/WRITE ONCE).
> > 
> > And I also prefer smp_wmb as it seems to be cheaper on ARM.
> > 
> > Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> > and READ_POINTER for symmetry?
> 
> What we do in some code is to comment the pairings, allowing the other
> side of the pairing to be easily located.  Would that work for you?
> 
>   Thanx, Paul

Yes, that's exactly what I did for now.

Thanks!

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 10:17:35PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 05, 2017 at 10:28:38PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > > > > WRITE_ONCE(obj->val, 1);
> > > > > > smp_wmb();
> > > > > > WRITE_ONCE(*foo, obj);
> > > > > 
> > > > > I believe Peter was instead suggesting:
> > > > > 
> > > > > WRITE_ONCE(obj->val, 1);
> > > > > smp_store_release(foo, obj);
> > > > 
> > > > Isn't that more expensive though?
> > > 
> > > Depends on the architecture. The only architecture where it is more
> > > expensive and people actually still care about is ARM I think.
> > 
> > Right. Why should I use the more expensive smp_store_release then?
> 
> Because it makes more sense. Memory ordering is hard enough, don't make
> it harder still if you don't have to.

I suspect I have to -  ptr_ring is a very low level construct used by
netowrking on data path so making it a bit more complicated for
a bit of performance is probably justified.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Paul E. McKenney
On Tue, Dec 05, 2017 at 11:24:49PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> > 
> > [ . . . ]
> > 
> > > > > and this barrier is no longer paired with anything until
> > > > > you realize there's a dependency barrier within READ_ONCE.
> > > > > 
> > > > > Barrier pairing was a useful tool to check code validity,
> > > > > maybe there are other, better tools now.
> > > > 
> > > > There are quite a few people who say that smp_store_release() is
> > > > easier for the tools to analyze than is smp_wmb().  My experience with
> > > > smp_read_barrier_depends() and rcu_dereference() leads me to believe
> > > > that they are correct.
> > > 
> > > OK, but smp_store_release is still not paired with anything since we
> > > rely on READ_ONCE to include the implicit dpendendency barrier.
> > 
> > Why wouldn't you consider the smp_store_release() to be paired with
> > the new improved READ_ONCE()?
> 
> READ_ONCE is really all over the place (some code literally replaced all
> memory accesses with READ/WRITE ONCE).
> 
> And I also prefer smp_wmb as it seems to be cheaper on ARM.
> 
> Would an API like WRITE_POINTER()/smp_store_pointer make sense,
> and READ_POINTER for symmetry?

What we do in some code is to comment the pairings, allowing the other
side of the pairing to be easily located.  Would that work for you?

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 12:08:01PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> 
> [ . . . ]
> 
> > > > and this barrier is no longer paired with anything until
> > > > you realize there's a dependency barrier within READ_ONCE.
> > > > 
> > > > Barrier pairing was a useful tool to check code validity,
> > > > maybe there are other, better tools now.
> > > 
> > > There are quite a few people who say that smp_store_release() is
> > > easier for the tools to analyze than is smp_wmb().  My experience with
> > > smp_read_barrier_depends() and rcu_dereference() leads me to believe
> > > that they are correct.
> > 
> > OK, but smp_store_release is still not paired with anything since we
> > rely on READ_ONCE to include the implicit dpendendency barrier.
> 
> Why wouldn't you consider the smp_store_release() to be paired with
> the new improved READ_ONCE()?
> 
>   Thanx, Paul

READ_ONCE is really all over the place (some code literally replaced all
memory accesses with READ/WRITE ONCE).

And I also prefer smp_wmb as it seems to be cheaper on ARM.

Would an API like WRITE_POINTER()/smp_store_pointer make sense,
and READ_POINTER for symmetry?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 10:28:38PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > > > WRITE_ONCE(obj->val, 1);
> > > > > smp_wmb();
> > > > > WRITE_ONCE(*foo, obj);
> > > > 
> > > > I believe Peter was instead suggesting:
> > > > 
> > > > WRITE_ONCE(obj->val, 1);
> > > > smp_store_release(foo, obj);
> > > 
> > > Isn't that more expensive though?
> > 
> > Depends on the architecture. The only architecture where it is more
> > expensive and people actually still care about is ARM I think.
> 
> Right. Why should I use the more expensive smp_store_release then?

Because it makes more sense. Memory ordering is hard enough, don't make
it harder still if you don't have to.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 08:57:52PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > > WRITE_ONCE(obj->val, 1);
> > > > smp_wmb();
> > > > WRITE_ONCE(*foo, obj);
> > > 
> > > I believe Peter was instead suggesting:
> > > 
> > > WRITE_ONCE(obj->val, 1);
> > > smp_store_release(foo, obj);
> > 
> > Isn't that more expensive though?
> 
> Depends on the architecture. The only architecture where it is more
> expensive and people actually still care about is ARM I think.

Right. Why should I use the more expensive smp_store_release then?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Paul E. McKenney
On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:

[ . . . ]

> > > and this barrier is no longer paired with anything until
> > > you realize there's a dependency barrier within READ_ONCE.
> > > 
> > > Barrier pairing was a useful tool to check code validity,
> > > maybe there are other, better tools now.
> > 
> > There are quite a few people who say that smp_store_release() is
> > easier for the tools to analyze than is smp_wmb().  My experience with
> > smp_read_barrier_depends() and rcu_dereference() leads me to believe
> > that they are correct.
> 
> OK, but smp_store_release is still not paired with anything since we
> rely on READ_ONCE to include the implicit dpendendency barrier.

Why wouldn't you consider the smp_store_release() to be paired with
the new improved READ_ONCE()?

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 09:51:48PM +0200, Michael S. Tsirkin wrote:
> > > WRITE_ONCE(obj->val, 1);
> > > smp_wmb();
> > > WRITE_ONCE(*foo, obj);
> > 
> > I believe Peter was instead suggesting:
> > 
> > WRITE_ONCE(obj->val, 1);
> > smp_store_release(foo, obj);
> 
> Isn't that more expensive though?

Depends on the architecture. The only architecture where it is more
expensive and people actually still care about is ARM I think.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote:
> > 
> > > I don't see WRITE_ONCE inserting any barriers, release or
> > > write.
> > 
> > Correct, never claimed there was.
> > 
> > Just saying that:
> > 
> > obj = READ_ONCE(*foo);
> > val = READ_ONCE(obj->val);
> > 
> > Never needs a barrier (except on Alpha and we want to make that go
> > away). Simply because a CPU needs to complete the load of @obj before it
> > can compute the address >val. Thus the second load _must_ come
> > after the first load and we get LOAD-LOAD ordering.
> > 
> > Alpha messing that up is a royal pain, and Alpha not being an
> > active/living architecture is just not worth the pain of keeping this in
> > the generic model.
> > 
> 
> Right. What I am saying is that for writes you need
> 
> WRITE_ONCE(obj->val, 1);
> smp_wmb();
> WRITE_ONCE(*foo, obj);

You really should use smp_store_release() here instead of the smp_wmb().
But yes.

> and this barrier is no longer paired with anything until
> you realize there's a dependency barrier within READ_ONCE.

No, there isn't. read_dependecy barriers are no more. They don't exist.
They never did, except on Alpha anyway.

There were a ton of sites that relied on this but never had the
smp_read_barrier_depends() annotation, similarly there were quite a few
sites that had the barrier but nobody could explain wtf for.

What these patches do is return the sane rule that dependent loads are
ordered.

And like all memory ordering; it should come with comments. Any piece of
code that relies on memory ordering and doesn't have big fat comments
that explain the required ordering are broken per definition. Maybe not
now, but they will be after a few years because someone changed it and
didn't know.

> Barrier pairing was a useful tool to check code validity,
> maybe there are other, better tools now.

Same is true for the closely related control dependency, you can pair
against those just fine but they don't have an explicit barrier
annotation.

There are no tools, use brain.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 11:33:39AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote:
> > > 
> > > > I don't see WRITE_ONCE inserting any barriers, release or
> > > > write.
> > > 
> > > Correct, never claimed there was.
> > > 
> > > Just saying that:
> > > 
> > >   obj = READ_ONCE(*foo);
> > >   val = READ_ONCE(obj->val);
> > > 
> > > Never needs a barrier (except on Alpha and we want to make that go
> > > away). Simply because a CPU needs to complete the load of @obj before it
> > > can compute the address >val. Thus the second load _must_ come
> > > after the first load and we get LOAD-LOAD ordering.
> > > 
> > > Alpha messing that up is a royal pain, and Alpha not being an
> > > active/living architecture is just not worth the pain of keeping this in
> > > the generic model.
> > > 
> > 
> > Right. What I am saying is that for writes you need
> > 
> > WRITE_ONCE(obj->val, 1);
> > smp_wmb();
> > WRITE_ONCE(*foo, obj);
> 
> I believe Peter was instead suggesting:
> 
> WRITE_ONCE(obj->val, 1);
> smp_store_release(foo, obj);

Isn't that more expensive though?


> > and this barrier is no longer paired with anything until
> > you realize there's a dependency barrier within READ_ONCE.
> > 
> > Barrier pairing was a useful tool to check code validity,
> > maybe there are other, better tools now.
> 
> There are quite a few people who say that smp_store_release() is
> easier for the tools to analyze than is smp_wmb().  My experience with
> smp_read_barrier_depends() and rcu_dereference() leads me to believe
> that they are correct.
> 
>   Thanx, Paul

OK, but smp_store_release is still not paired with anything since we
rely on READ_ONCE to include the implicit dpendendency barrier.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Paul E. McKenney
On Tue, Dec 05, 2017 at 09:24:21PM +0200, Michael S. Tsirkin wrote:
> On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote:
> > 
> > > I don't see WRITE_ONCE inserting any barriers, release or
> > > write.
> > 
> > Correct, never claimed there was.
> > 
> > Just saying that:
> > 
> > obj = READ_ONCE(*foo);
> > val = READ_ONCE(obj->val);
> > 
> > Never needs a barrier (except on Alpha and we want to make that go
> > away). Simply because a CPU needs to complete the load of @obj before it
> > can compute the address >val. Thus the second load _must_ come
> > after the first load and we get LOAD-LOAD ordering.
> > 
> > Alpha messing that up is a royal pain, and Alpha not being an
> > active/living architecture is just not worth the pain of keeping this in
> > the generic model.
> > 
> 
> Right. What I am saying is that for writes you need
> 
> WRITE_ONCE(obj->val, 1);
> smp_wmb();
> WRITE_ONCE(*foo, obj);

I believe Peter was instead suggesting:

WRITE_ONCE(obj->val, 1);
smp_store_release(foo, obj);

> and this barrier is no longer paired with anything until
> you realize there's a dependency barrier within READ_ONCE.
> 
> Barrier pairing was a useful tool to check code validity,
> maybe there are other, better tools now.

There are quite a few people who say that smp_store_release() is
easier for the tools to analyze than is smp_wmb().  My experience with
smp_read_barrier_depends() and rcu_dereference() leads me to believe
that they are correct.

Thanx, Paul

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] ptr_ring: add barriers

2017-12-05 Thread Michael S. Tsirkin
Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array.  This was observed causing crashes.

To fix, add memory barriers.  The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian 
Suggested-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
---

George, could you pls report whether this patch fixes
the issue for you?

This seems to be needed in stable as well.




 include/linux/ptr_ring.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 37b4bb2..6866df4 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
 
 /* Note: callers invoking this in a loop must use a compiler barrier,
  * for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
  */
 static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;
 
+   /* Make sure the pointer we are storing points to a valid data. */
+   /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+   smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
@@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
if (ptr)
__ptr_ring_discard_one(r);
 
+   /* Make sure anyone accessing data through the pointer is up to date. */
+   /* Pairs with smp_wmb in __ptr_ring_produce. */
+   smp_read_barrier_depends();
return ptr;
 }
 
-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 08:17:33PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote:
> 
> > I don't see WRITE_ONCE inserting any barriers, release or
> > write.
> 
> Correct, never claimed there was.
> 
> Just saying that:
> 
>   obj = READ_ONCE(*foo);
>   val = READ_ONCE(obj->val);
> 
> Never needs a barrier (except on Alpha and we want to make that go
> away). Simply because a CPU needs to complete the load of @obj before it
> can compute the address >val. Thus the second load _must_ come
> after the first load and we get LOAD-LOAD ordering.
> 
> Alpha messing that up is a royal pain, and Alpha not being an
> active/living architecture is just not worth the pain of keeping this in
> the generic model.
> 

Right. What I am saying is that for writes you need

WRITE_ONCE(obj->val, 1);
smp_wmb();
WRITE_ONCE(*foo, obj);

and this barrier is no longer paired with anything until
you realize there's a dependency barrier within READ_ONCE.

Barrier pairing was a useful tool to check code validity,
maybe there are other, better tools now.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat wrote:
> Then we'll have a single solution for both netvsc and virtio (and any
> other PV device).
> And we could handle the VF DMA dirt issue agnostically.

For the record, I won't block patches adding this kist to virtio
on the basis that they must be generic. It's not a lot
of code, implementation can come first, prettify later.

But we do need to have a discussion about how devices are paired.
I am not sure using just MAC works. E.g. some passthrough
devices don't give host ability to set the MAC.
Are these worth worrying about?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 08:57:46PM +0200, Michael S. Tsirkin wrote:

> I don't see WRITE_ONCE inserting any barriers, release or
> write.

Correct, never claimed there was.

Just saying that:

obj = READ_ONCE(*foo);
val = READ_ONCE(obj->val);

Never needs a barrier (except on Alpha and we want to make that go
away). Simply because a CPU needs to complete the load of @obj before it
can compute the address >val. Thus the second load _must_ come
after the first load and we get LOAD-LOAD ordering.

Alpha messing that up is a royal pain, and Alpha not being an
active/living architecture is just not worth the pain of keeping this in
the generic model.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Michael S. Tsirkin
On Tue, Dec 05, 2017 at 07:39:46PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 05, 2017 at 08:31:20PM +0200, Michael S. Tsirkin wrote:
> 
> > Apropos, READ_ONCE is now asymmetrical with WRITE_ONCE.
> > 
> > I can read a pointer with READ_ONCE and be sure the value
> > is sane, but only if I also remember to put in smp_wmb before
> > WRITE_ONCE. Otherwise the pointer is ok but no guarantees
> > about the data pointed to.
> 
> That was already the case on everything except Alpha. And the canonical
> match do the data dependency is store_release, not wmb.

Oh, interesting

static __always_inline void __write_once_size(volatile void *p, void *res, int 
size)
{
switch (size) {
case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
default:
barrier();
__builtin_memcpy((void *)p, (const void *)res, size);
barrier();
}
}

#define WRITE_ONCE(x, val) \
({  \
union { typeof(x) __val; char __c[1]; } __u =   \
{ .__val = (__force typeof(x)) (val) }; \
__write_once_size(&(x), __u.__c, sizeof(x));\
__u.__val;  \
})

I don't see WRITE_ONCE inserting any barriers, release or
write.

So it seems that on an architecture where writes can be reordered,
if I do

*pointer = 0xa;
WRITE_ONCE(array[x], pointer);

array write might bypass the pointer write,
and readers will read a stale value.




-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH tip/core/rcu 21/21] drivers/vhost: Remove now-redundant read_barrier_depends()

2017-12-05 Thread Peter Zijlstra
On Tue, Dec 05, 2017 at 08:31:20PM +0200, Michael S. Tsirkin wrote:

> Apropos, READ_ONCE is now asymmetrical with WRITE_ONCE.
> 
> I can read a pointer with READ_ONCE and be sure the value
> is sane, but only if I also remember to put in smp_wmb before
> WRITE_ONCE. Otherwise the pointer is ok but no guarantees
> about the data pointed to.

That was already the case on everything except Alpha. And the canonical
match do the data dependency is store_release, not wmb.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/2] virtio_mmio: add cleanup for virtio_mmio_probe

2017-12-05 Thread weiping zhang
As mentioned at drivers/base/core.c:
/*
 * NOTE: _Never_ directly free @dev after calling this function, even
 * if it returned an error! Always use put_device() to give up the
 * reference initialized in this function instead.
 */

Normal we do cleanup for @vm_dev by contianer_of(@dev), but in this case
we need release @mem resource from @pdev and vm_dev->base. It make
@pdev->vm_dev.dev.release() too complicated, so put_device just put the
reference of register_virtio_device->device_register->device_initialize
and release all resource in virtio_mmio_probe.

Signed-off-by: weiping zhang 
---
 drivers/virtio/virtio_mmio.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 74dc717..f984510 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -513,8 +513,10 @@ static int virtio_mmio_probe(struct platform_device *pdev)
return -EBUSY;
 
vm_dev = devm_kzalloc(>dev, sizeof(*vm_dev), GFP_KERNEL);
-   if (!vm_dev)
-   return  -ENOMEM;
+   if (!vm_dev) {
+   rc = -ENOMEM;
+   goto free_mem;
+   }
 
vm_dev->vdev.dev.parent = >dev;
vm_dev->vdev.dev.release = virtio_mmio_release_dev_empty;
@@ -524,14 +526,17 @@ static int virtio_mmio_probe(struct platform_device *pdev)
spin_lock_init(_dev->lock);
 
vm_dev->base = devm_ioremap(>dev, mem->start, resource_size(mem));
-   if (vm_dev->base == NULL)
-   return -EFAULT;
+   if (vm_dev->base == NULL) {
+   rc = -EFAULT;
+   goto free_vmdev;
+   }
 
/* Check magic value */
magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) {
dev_warn(>dev, "Wrong magic value 0x%08lx!\n", magic);
-   return -ENODEV;
+   rc = -ENODEV;
+   goto unmap;
}
 
/* Check device version */
@@ -539,7 +544,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)
if (vm_dev->version < 1 || vm_dev->version > 2) {
dev_err(>dev, "Version %ld not supported!\n",
vm_dev->version);
-   return -ENXIO;
+   rc = -ENXIO;
+   goto unmap;
}
 
vm_dev->vdev.id.device = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_ID);
@@ -548,7 +554,8 @@ static int virtio_mmio_probe(struct platform_device *pdev)
 * virtio-mmio device with an ID 0 is a (dummy) placeholder
 * with no function. End probing now with no error reported.
 */
-   return -ENODEV;
+   rc = -ENODEV;
+   goto unmap;
}
vm_dev->vdev.id.vendor = readl(vm_dev->base + VIRTIO_MMIO_VENDOR_ID);
 
@@ -573,7 +580,20 @@ static int virtio_mmio_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, vm_dev);
 
-   return register_virtio_device(_dev->vdev);
+   rc = register_virtio_device(_dev->vdev);
+   if (rc)
+   goto put_dev;
+   return 0;
+put_dev:
+   put_device(_dev->vdev.dev);
+unmap:
+   iounmap(vm_dev->base);
+free_mem:
+   devm_release_mem_region(>dev, mem->start,
+   resource_size(mem));
+free_vmdev:
+   devm_kfree(>dev, vm_dev);
+   return rc;
 }
 
 static int virtio_mmio_remove(struct platform_device *pdev)
-- 
2.9.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/2] Add cleanup for virtio_mmio driver

2017-12-05 Thread weiping zhang
this patchset try to add cleanup for virtio_mmio driver, include
virtio_mmio_probe and virtio_mmio_remove

weiping zhang (2):
  virtio_mmio: add cleanup for virtio_mmio_probe
  virtio_mmio: add cleanup for virtio_mmio_remove

 drivers/virtio/virtio_mmio.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

-- 
2.9.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/2] virtio_mmio: add cleanup for virtio_mmio_remove

2017-12-05 Thread weiping zhang
cleanup all resource allocated by virtio_mmio_probe.

Signed-off-by: weiping zhang 
---
 drivers/virtio/virtio_mmio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f984510..5e2ca34 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -599,8 +599,15 @@ static int virtio_mmio_probe(struct platform_device *pdev)
 static int virtio_mmio_remove(struct platform_device *pdev)
 {
struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
+   struct resource *mem;
 
unregister_virtio_device(_dev->vdev);
+   iounmap(vm_dev->base);
+   mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (mem)
+   devm_release_mem_region(>dev, mem->start,
+   resource_size(mem));
+   devm_kfree(>dev, vm_dev);
 
return 0;
 }
-- 
2.9.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread achiad shochat
On 4 December 2017 at 18:30, Alexander Duyck  wrote:
> On Mon, Dec 4, 2017 at 1:51 AM, achiad shochat
>  wrote:
>> On 3 December 2017 at 19:35, Stephen Hemminger
>>  wrote:
>>> On Sun, 3 Dec 2017 11:14:37 +0200
>>> achiad shochat  wrote:
>>>
 On 3 December 2017 at 07:05, Michael S. Tsirkin  wrote:
 > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:
 >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
 >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:
 >> > > Re. problem #2:
 >> > > Indeed the best way to address it seems to be to enslave the VF 
 >> > > driver
 >> > > netdev under a persistent anchor netdev.
 >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
 >> > > netdev to work in conjunction.
 >> > > And it's indeed desired that this enslavement logic work out-of-the 
 >> > > box.
 >> > > But in case of PV+VF some configurable policies must be in place 
 >> > > (and
 >> > > they'd better be generic rather than differ per PV technology).
 >> > > For example - based on which characteristics should the PV+VF 
 >> > > coupling
 >> > > be done? netvsc uses MAC address, but that might not always be the
 >> > > desire.
 >> >
 >> > It's a policy but not guest userspace policy.
 >> >
 >> > The hypervisor certainly knows.
 >> >
 >> > Are you concerned that someone might want to create two devices with 
 >> > the
 >> > same MAC for an unrelated reason?  If so, hypervisor could easily set 
 >> > a
 >> > flag in the virtio device to say "this is a backup, use MAC to find
 >> > another device".
 >>
 >> This is something I was going to suggest: a flag or other configuration 
 >> on
 >> the virtio device to help control how this new feature is used.  I can
 >> imagine this might be useful to control from either the hypervisor side 
 >> or
 >> the VM side.
 >>
 >> The hypervisor might want to (1) disable it (force it off), (2) enable 
 >> it
 >> for VM choice, or (3) force it on for the VM.  In case (2), the VM 
 >> might be
 >> able to chose whether it wants to make use of the feature, or stick 
 >> with the
 >> bonding solution.
 >>
 >> Either way, the kernel is making a feature available, and the user (VM 
 >> or
 >> hypervisor) is able to control it by selecting the feature based on the
 >> policy desired.
 >>
 >> sln
 >
 > I'm not sure what's the feature that is available here.
 >
 > I saw this as a flag that says "this device shares backend with another
 > network device which can be found using MAC, and that backend should be
 > preferred".  kernel then forces configuration which uses that other
 > backend - as long as it exists.
 >
 > However, please Cc virtio-dev mailing list if we are doing this since
 > this is a spec extension.
 >
 > --
 > MST


 Can someone please explain why assume a virtio device is there at all??
 I specified a case where there isn't any.
>
> Migrating without any virtual device is going to be extremely
> challenging, especially in any kind of virtualization setup where the
> hosts are not homogeneous. By providing a virtio interface you can
> guarantee that at least 1 network interface is available on any given
> host, and then fail over to that as the least common denominator for
> any migration.
>

I am not sure why you think it is going to be so challenging.
Are you referring to preserving the pass-through device driver state
(RX/TX rings)?
I do not think we should preserve them, we can simply teardown the
whole VF netdev (since we have a parent netdev as application
interface).
The downtime impact will be negligible.

 I second Jacob - having a netdev of one device driver enslave a netdev
 of another device driver is an awkward a-symmetric model.
 Regardless of whether they share the same backend device.
 Only I am not sure the Linux Bond is the right choice.
 e.g one may well want to use the virtio device also when the
 pass-through device is available, e.g for multicasts, east-west
 traffic, etc.
 I'm not sure the Linux Bond fits that functionality.
 And, as I hear in this thread, it is hard to make it work out of the box.
 So I think the right thing would be to write a new dedicated module
 for this purpose.
>
> This part I can sort of agree with. What if we were to look at
> providing a way to somehow advertise that the two devices were meant
> to be boded for virtualization purposes? For now lets call it a
> "virt-bond". Basically we could look at providing a means for virtio
> and VF drivers to advertise that they want this sort of bond. Then it
> would just be a matter of