Re: [PATCH] maintainers: drop Chris Wright from pvops

2017-10-26 Thread Rusty Russell
Chris CC'd: He wasn't that hard to find.

(linkedin says he's CTO of RedHat now.  I feel like an underachiever!)

Cheers,
Rusty.

Juergen Gross <jgr...@suse.com> writes:

> Mails to chr...@sous-sol.org are not deliverable since several months.
> Drop him as PARAVIRT_OPS maintainer.
>
> Signed-off-by: Juergen Gross <jgr...@suse.com>
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d85c08956875..af0cb69f6a3e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10179,7 +10179,6 @@ F:Documentation/parport*.txt
>  
>  PARAVIRT_OPS INTERFACE
>  M:   Juergen Gross <jgr...@suse.com>
> -M:   Chris Wright <chr...@sous-sol.org>
>  M:   Alok Kataria <akata...@vmware.com>
>  M:   Rusty Russell <ru...@rustcorp.com.au>
>  L:   virtualization@lists.linux-foundation.org
> -- 
> 2.12.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] paravirt: remove paravirt ops pmd_update[_defer] and pte_update_defer

2015-11-24 Thread Rusty Russell
Juergen Gross <jgr...@suse.com> writes:
> Ping?

Acked-by: Rusty Russell <ru...@rustcorp.com.au>

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/paravirt: remove unused operation

2015-08-30 Thread Rusty Russell
Juergen Gross jgr...@suse.com writes:
 Ping?

Acked-by: Rusty Russell ru...@rustcorp.com.au

Cheers,
Rusty.

 On 08/06/2015 01:55 PM, Juergen Gross wrote:
 Remove the paravirt operation get_tsc_khz as it is used nowhere.

 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
   arch/x86/include/asm/paravirt_types.h | 1 -
   1 file changed, 1 deletion(-)

 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index a6b8f9f..5a18a66 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -97,7 +97,6 @@ struct pv_lazy_ops {
   struct pv_time_ops {
  unsigned long long (*sched_clock)(void);
  unsigned long long (*steal_clock)(int cpu);
 -unsigned long (*get_tsc_khz)(void);
   };

   struct pv_cpu_ops {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-net: why not always to set avail-flags to VRING_AVAIL_F_NO_INTERRUPT

2015-06-04 Thread Rusty Russell
Linhaifeng haifeng@huawei.com writes:
 On 2015/6/4 9:13, Rusty Russell wrote:
 Linhaifeng haifeng@huawei.com writes:
 Hi,

 I'm a newbie and have a question about vring_new_virtqueue function.

 Why we set avail-flags to VRING_AVAIL_F_NO_INTERRUPT when no callbacks?
 I think we should set avail-flags to VRING_AVAIL_F_NO_INTERRUPT even if no 
 callbacks.
 
 Hi Linhaifeng,
 
 Not sure I understand your question, but I'll try to answer.
 
 We don't set VRING_AVAIL_F_NO_INTERRUPT if there's a callback because we
 want that callback called.  Otherwise callback will never be used.
 
 Cheers,
 Rusty.
 
 

 Hi Rusty,

 Thank you for your response.

 I mean should we set VRING_AVAIL_F_NO_INTERRUPT when virtqueue is initialized 
 whether there is callback or not?
 As it would be set in function virtqueue_disable_cb and 
 virtqueue_enable_cb_prepare later.

You're right, the callers could virtqueue_enable() specifically, but
I think most callers will want the virtqueue enabled immediately.

And remember, the VRING_AVAIL_F_NO_INTERRUPT is just a *hint*: the
device is allows to ignore it (this can happen due to race conditions,
for example).

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio-net: why not always to set avail-flags to VRING_AVAIL_F_NO_INTERRUPT

2015-06-03 Thread Rusty Russell
Linhaifeng haifeng@huawei.com writes:
 Hi,

 I'm a newbie and have a question about vring_new_virtqueue function.

 Why we set avail-flags to VRING_AVAIL_F_NO_INTERRUPT when no callbacks?
 I think we should set avail-flags to VRING_AVAIL_F_NO_INTERRUPT even if no 
 callbacks.

Hi Linhaifeng,

Not sure I understand your question, but I'll try to answer.

We don't set VRING_AVAIL_F_NO_INTERRUPT if there's a callback because we
want that callback called.  Otherwise callback will never be used.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 4/4] rpmsg: DMA map sgs passed to virtio

2015-05-06 Thread Rusty Russell
Edgar E. Iglesias edgar.igles...@gmail.com writes:
 From: Edgar E. Iglesias edgar.igles...@xilinx.com

 Signed-off-by: Edgar E. Iglesias edgar.igles...@xilinx.com

First off, I have handed maintainership off to Michael S. Tsirkin, so
his word is now law.

That said... there's nothing fundamentally *wrong* with this, but it's
not how standard virtio works.  We decided some time ago that as we're
paravirtualized, we would not be doing address mapping.

rpmsg uses virtio, but it's with a twist: they're not talking to a
host.  Thus my preference, in order, would be:

1) Don't use non-kmalloc addresses.
2) If that's not possible, call these _dma interfaces _rpmsg instead,
   so normal virtio users don't get confused and try to use them.

Cheers,
Rusty.


 ---
  drivers/rpmsg/virtio_rpmsg_bus.c | 28 ++--
  1 file changed, 22 insertions(+), 6 deletions(-)

 diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
 b/drivers/rpmsg/virtio_rpmsg_bus.c
 index 73354ee..9ae53a0 100644
 --- a/drivers/rpmsg/virtio_rpmsg_bus.c
 +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
 @@ -210,6 +210,22 @@ static void __ept_release(struct kref *kref)
   kfree(ept);
  }
  
 +static inline dma_addr_t msg_dma_address(struct virtproc_info *vrp, void 
 *msg)
 +{
 + unsigned long offset = msg - vrp-rbufs;
 +
 + return vrp-bufs_dma + offset;
 +}
 +
 +static inline void rpmsg_msg_sg_init(struct virtproc_info *vrp,
 +  struct scatterlist *sg,
 +  void *msg, unsigned int len)
 +{
 + sg_init_table(sg, 1);
 + sg_dma_address(sg) = msg_dma_address(vrp, msg);
 + sg_dma_len(sg) = len;
 +}
 +
  /* for more info, see below documentation of rpmsg_create_ept() */
  static struct rpmsg_endpoint *__rpmsg_create_ept(struct virtproc_info *vrp,
   struct rpmsg_channel *rpdev, rpmsg_rx_cb_t cb,
 @@ -754,12 +770,12 @@ int rpmsg_send_offchannel_raw(struct rpmsg_channel 
 *rpdev, u32 src, u32 dst,
   print_hex_dump(KERN_DEBUG, rpmsg_virtio TX: , DUMP_PREFIX_NONE, 16, 1,
   msg, sizeof(*msg) + msg-len, true);
  
 - sg_init_one(sg, msg, sizeof(*msg) + len);
 + rpmsg_msg_sg_init(vrp, sg, msg, sizeof(*msg) + len);
  
   mutex_lock(vrp-tx_lock);
  
   /* add message to the remote processor's virtqueue */
 - err = virtqueue_add_outbuf(vrp-svq, sg, 1, msg, GFP_KERNEL);
 + err = dma_virtqueue_add_outbuf(vrp-svq, sg, 1, msg, GFP_KERNEL);
   if (err) {
   /*
* need to reclaim the buffer here, otherwise it's lost
 @@ -828,10 +844,10 @@ static int rpmsg_recv_single(struct virtproc_info *vrp, 
 struct device *dev,
   dev_warn(dev, msg received with no recipient\n);
  
   /* publish the real size of the buffer */
 - sg_init_one(sg, msg, RPMSG_BUF_SIZE);
 + rpmsg_msg_sg_init(vrp, sg, msg, RPMSG_BUF_SIZE);
  
   /* add the buffer back to the remote processor's virtqueue */
 - err = virtqueue_add_inbuf(vrp-rvq, sg, 1, msg, GFP_KERNEL);
 + err = dma_virtqueue_add_inbuf(vrp-rvq, sg, 1, msg, GFP_KERNEL);
   if (err  0) {
   dev_err(dev, failed to add a virtqueue buffer: %d\n, err);
   return err;
 @@ -1007,9 +1023,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
   struct scatterlist sg;
   void *cpu_addr = vrp-rbufs + i * RPMSG_BUF_SIZE;
  
 - sg_init_one(sg, cpu_addr, RPMSG_BUF_SIZE);
 + rpmsg_msg_sg_init(vrp, sg, cpu_addr, RPMSG_BUF_SIZE);
  
 - err = virtqueue_add_inbuf(vrp-rvq, sg, 1, cpu_addr,
 + err = dma_virtqueue_add_inbuf(vrp-rvq, sg, 1, cpu_addr,
   GFP_KERNEL);
   WARN_ON(err); /* sanity check; this can't really happen */
   }
 -- 
 1.9.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio: pass baton to Michael Tsirkin

2015-05-01 Thread Rusty Russell
With my job change kernel work will be own time; I'm keeping lguest
and modules (and the virtio standards work), but virtio kernel has to
go.

This makes it clear that Michael is in charge.  He's good, but having
me watch over his shoulder won't help.

Good luck Michael!

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/MAINTAINERS b/MAINTAINERS
index 2e5bbc0d68b2..16227759dfa8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10523,7 +10523,6 @@ F:  include/linux/virtio_console.h
 F: include/uapi/linux/virtio_console.h
 
 VIRTIO CORE, NET AND BLOCK DRIVERS
-M: Rusty Russell ru...@rustcorp.com.au
 M: Michael S. Tsirkin m...@redhat.com
 L: virtualization@lists.linux-foundation.org
 S: Maintained
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: fix typo in vring_need_event() doc comment

2015-04-20 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Sun, Apr 19, 2015 at 02:36:38PM +0930, Rusty Russell wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
  Here the other side refers to the guest or host.
 
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 
 Applied.
 
 Thanks!
 Rusty.

 Just to make sure, are you applying this for 4.1?

No, I am going to hand you my pending queue.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: fix typo in vring_need_event() doc comment

2015-04-19 Thread Rusty Russell
Stefan Hajnoczi stefa...@redhat.com writes:
 Here the other side refers to the guest or host.

 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com

Applied.

Thanks!
Rusty.

 ---
  include/uapi/linux/virtio_ring.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/include/uapi/linux/virtio_ring.h 
 b/include/uapi/linux/virtio_ring.h
 index a3318f3..915980a 100644
 --- a/include/uapi/linux/virtio_ring.h
 +++ b/include/uapi/linux/virtio_ring.h
 @@ -155,7 +155,7 @@ static inline unsigned vring_size(unsigned int num, 
 unsigned long align)
  }
  
  /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
 -/* Assuming a given event_idx value from the other size, if
 +/* Assuming a given event_idx value from the other side, if
   * we have just incremented index from old to new_idx,
   * should we trigger an event? */
  static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
 -- 
 2.1.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] virtio_balloon: virtio 1 support

2015-04-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Apr 14, 2015 at 11:50:53AM +0200, Cornelia Huck wrote:
 On Tue, 14 Apr 2015 11:21:11 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  diff --git a/include/uapi/linux/virtio_balloon.h 
  b/include/uapi/linux/virtio_balloon.h
  index f81b220..164e0c2 100644
  --- a/include/uapi/linux/virtio_balloon.h
  +++ b/include/uapi/linux/virtio_balloon.h
  @@ -52,15 +52,31 @@ struct virtio_balloon_config {
   #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
   #define VIRTIO_BALLOON_S_NR   6
  
  +/*
  + * Memory statistics structure.
  + * Driver fills an array of these structures and passes to device.
  + *
  + * NOTE: fields are laid out in a way that would make compiler add padding
  + * between and after fields, so we have to use compiler-specific 
  attributes to
  + * pack it, to disable this padding. This also often causes compiler to
  + * generate suboptimal code.
  + *
  + * We maintain this for backwards compatibility, but don't follow this 
  example.
 
 s/this/the existing statistics structure/

 existing seems redundant. What else? non-existing?

  + *
  + * Do something like the below instead:
 
 If you want to implement a similar structure, do...
 
 Just that nobody gets the idea that they are supposed to implement new
 balloon statistics ;)
 
  + * struct virtio_balloon_stat {
  + * __virtio16 tag;
  + * __u8 reserved[6];
  + * __virtio64 val;
  + * };
 
 (...)
 
  @@ -225,13 +219,8 @@ static inline void update_stat(struct virtio_balloon 
  *vb, int idx,
u16 tag, u64 val)
   {
 BUG_ON(idx = VIRTIO_BALLOON_S_NR);
  -  if (virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) {
  -  vb-stats[idx].tag = cpu_to_le32(tag);
  -  vb-stats[idx].val = cpu_to_le64(val);
  -  } else {
  -  vb-legacy_stats[idx].tag = tag;
  -  vb-legacy_stats[idx].val = val;
  -  }
  +  vb-stats[idx].tag = cpu_to_virtio16(vb-vdev, tag);
 
 Seems that nobody seemed to care much about statistics...

 Or about BE guests ;)

  +  vb-stats[idx].val = cpu_to_virtio64(vb-vdev, val);
   }
  
   #define pages_to_bytes(x) ((u64)(x)  PAGE_SHIFT)
  
 
 With these changes merged in:
 
 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com


 OK, here's an updated incremental patch: only comment
 changed.

OK, I've merged this with one change:

+static void stats_sg_init(struct virtio_balloon *vb, struct scatterlist *sg)
+{
+   sg_init_one(sg, vb-stats, sizeof(vb-stats));
+}
+
...
-   sg_init_one(sg, vb-stats, sizeof(vb-stats));
+   stats_sg_init(vb, sg);

This is no longer a meaningful change, so I removed it.

Here's the final result:

From: Michael S. Tsirkin m...@redhat.com
Subject: virtio_balloon: transitional interface

Virtio 1.0 doesn't include a modern balloon device.
But it's not a big change to support a transitional
balloon device: this has the advantage of supporting
existing drivers, transparently.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6a356e344f82..9db546ebe5a1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -214,8 +219,8 @@ static inline void update_stat(struct virtio_balloon *vb, 
int idx,
   u16 tag, u64 val)
 {
BUG_ON(idx = VIRTIO_BALLOON_S_NR);
-   vb-stats[idx].tag = tag;
-   vb-stats[idx].val = val;
+   vb-stats[idx].tag = cpu_to_virtio16(vb-vdev, tag);
+   vb-stats[idx].val = cpu_to_virtio64(vb-vdev, val);
 }
 
 #define pages_to_bytes(x) ((u64)(x)  PAGE_SHIFT)
@@ -283,18 +288,27 @@ static void virtballoon_changed(struct virtio_device 
*vdev)
 
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
-   __le32 v;
s64 target;
+   u32 num_pages;
 
-   virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages, v);
+   virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages,
+num_pages);
 
-   target = le32_to_cpu(v);
+   /* Legacy balloon config space is LE, unlike all other devices. */
+   if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1))
+   num_pages = le32_to_cpu((__force __le32)num_pages);
+
+   target = num_pages;
return target - vb-num_pages;
 }
 
 static void update_balloon_size(struct virtio_balloon *vb)
 {
-   __le32 actual = cpu_to_le32(vb-num_pages);
+   u32 actual = vb-num_pages;
+
+   /* Legacy balloon config space is LE, unlike all other devices. */
+   if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1))
+   actual = (__force u32)cpu_to_le32(actual);
 
virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual,
  actual);
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux

Re: [PATCH] virtio_balloon: drop virtio_balloon_stat_modern

2015-04-14 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Tue, 14 Apr 2015 12:01:13 +0200
 Michael S. Tsirkin m...@redhat.com wrote:

 Looks like we are better off sticking with the misaligned stat struct,
 to reduce the amount of virtio 1 specific code in balloon.  So let's do
 it.
 
 Add a detailed comment to reduce the chance people copy this bad example.
 
 This also fixes a bug on BE architectures: tag should use
 cpu_to_le16, not cpu_to_le32.
 
 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Just reposting so it's easier to apply.
 Feel free to squash into previous patch if you think it's
 neater.

 +1 for squashing from me. My A-b would also apply to the squashed patch.

Yes, I already squashed.

TBH, I would have squashed the next patches into one too, that simply
got rid of the virtio_device_is_legacy_only() function and all callers.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] virtio_balloon: virtio 1 support

2015-04-13 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Wed, Apr 01, 2015 at 02:57:35PM +0200, Michael S. Tsirkin wrote:
 Virtio 1.0 doesn't include a modern balloon device.  At some point we'll 
 likely
 define an incompatible interface with a different ID and different
 semantics.  But for now, it's not a big effort to support a transitional
 balloon device: this has the advantage of supporting existing drivers,
 transparently, as well as transports that don't allow mixing virtio 0 and
 virtio 1 devices. And balloon is an easy device to test, so it's also useful
 for people to test virtio core handling of transitional devices.
 
 The only interface issue is with the stats buffer, which has misaligned
 fields. We could leave it as is, but this sets a bad precedent that
 others might copy by mistake.
 
 As we need to change stats code to do byteswaps for virtio 1.0, it seems easy
 to fix by prepending a 6 byte reserved field.  I also had to change config
 structure field types from __le32 to __u32 to match other devices. This means
 we need a couple of __force tags for legacy path but that seems minor.

 Rusty, what are your thoughts here?
 How about merging this for 4.1?

I disagree with making any changes, other than allowing it to be used
with VIRTIO_F_VERSION_1.

However it doesn't seem to bother anyone else, so I've applied it for
4.1.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_config: reorder functions

2015-04-10 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Wed, 8 Apr 2015 16:49:46 +0200
 Michael S. Tsirkin m...@redhat.com wrote:

 On Tue, Mar 31, 2015 at 05:17:25PM +0200, Michael S. Tsirkin wrote:
  This simply reorders functions in virtio_config
  so width access wrapper helpers are all together.
  Drops an extra empty line while we are at it.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Ping
 
  ---
   include/linux/virtio_config.h | 15 +++
   1 file changed, 7 insertions(+), 8 deletions(-)
  

 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

Sorry, it's in my virtio-next tree, I obviously failed to Ack.

Applied,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: Update weak barriers to use dma_wmb/rmb

2015-04-10 Thread Rusty Russell
Alexander Duyck alexander.h.du...@redhat.com writes:
 This change makes it so that instead of using smp_wmb/rmb which varies
 depending on the kernel configuration we can can use dma_wmb/rmb which for
 most architectures should be equal to or slightly more strict than
 smp_wmb/rmb.

 The advantage to this is that these barriers are available to uniprocessor
 builds as well so the performance should improve under such a
 configuration.

 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com

This seems OK to me, since it's really as much a cleanup as anything,
but like you I do wonder if there benefit on ARM in practice.

Applied, thanks.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] netdevice: document NETDEV_TX_BUSY deprecation.

2015-04-03 Thread Rusty Russell
This paraphrases DaveM (and steals some of his words) explaining why
a device shouldn't return NETDEV_TX_BUSY, even though it looks so inviting
to driver authors.

See http://www.spinics.net/lists/netdev/msg322350.html

Inspired-by: David Miller da...@davemloft.net
Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dcf6ec27739b..a2cad44b8630 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -795,7 +795,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
  *   struct net_device *dev);
  * Called when a packet needs to be transmitted.
- * Must return NETDEV_TX_OK , NETDEV_TX_BUSY.
+ * Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should stop
+ * the queue before that can happen; it's for obsolete devices and weird
+ * corner cases, but the stack really does a non-trivial amount
+ * of useless work if you return NETDEV_TX_BUSY.
  *(can also return NETDEV_TX_LOCKED iff NETIF_F_LLTX)
  * Required can not be NULL.
  *
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: drop a useless config read

2015-03-31 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Tue, 31 Mar 2015 13:55:42 +0200
 Michael S. Tsirkin m...@redhat.com wrote:

 commit d71de9ec6ba806104439d3a669befda84757b5af
 virtio: core support for config generation
 fixed reading up 64 bit values, adding generation
 checks for such reads.
 
 By mistake, it left an explicit get call in place
 as well. the result is that the value is read twice,
 the first result is discarded.
 
 Not a big deal since this only happens with virtio
 blk and only on boot ATM, so performance isn't
 affected, but let's clean it up.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  include/linux/virtio_config.h | 1 -
  1 file changed, 1 deletion(-)

 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

Applied.

Thanks!
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/6] virtio_balloon: transitional interface

2015-03-31 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Virtio 1.0 doesn't include a modern balloon device.
 But it's not a big change to support a transitional
 balloon device: this has the advantage of supporting
 existing drivers, transparently.

You decided to fix the packed struct...

 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index 6a356e3..574267f 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -77,7 +77,7 @@ struct virtio_balloon {
  
   /* Memory statistics */
   int need_stats_update;
 - struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 + struct virtio_balloon_stat_modern stats[VIRTIO_BALLOON_S_NR];
  
   /* To register callback in oom notifier call chain */
   struct notifier_block nb;
 @@ -269,7 +269,11 @@ static void stats_handle_request(struct virtio_balloon 
 *vb)
   vq = vb-stats_vq;
   if (!virtqueue_get_buf(vq, len))
   return;
 - sg_init_one(sg, vb-stats, sizeof(vb-stats));
 + if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 + sg_init_one(sg, vb-stats, sizeof(vb-stats));
 + else
 + sg_init_one(sg, vb-stats-tag, sizeof(vb-stats) -
 + offsetof(typeof(*vb-stats, tag);

This makes it compile, but definitely won't work.

   virtqueue_add_outbuf(vq, sg, 1, vb, GFP_KERNEL);
   virtqueue_kick(vq);
  }
 @@ -283,21 +287,30 @@ static void virtballoon_changed(struct virtio_device 
 *vdev)
  
  static inline s64 towards_target(struct virtio_balloon *vb)
  {
 - __le32 v;
   s64 target;
 + u32 num_pages;
  
 - virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages, v);
 + virtio_cread(vb-vdev, struct virtio_balloon_config,
 +  num_pages, num_pages);
  
 - target = le32_to_cpu(v);
 + /* Legacy balloon config space is LE, unlike all other devices. */
 + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1))
 + num_pages = le32_to_cpu((__force le32)num_pages);
 +
 + target = num_pages;
   return target - vb-num_pages;
  }
  
  static void update_balloon_size(struct virtio_balloon *vb)
  {
 - __le32 actual = cpu_to_le32(vb-num_pages);
 + u32 actual = vb-num_pages;
 +
 + /* Legacy balloon config space is LE, unlike all other devices. */
 + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1))
 + actual = (__force u32)cpu_to_le32(num_pages);
  
 - virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual,
 -   actual);
 + virtio_cwrite(vb-vdev, struct virtio_balloon_config,
 +   actual, actual);
  }

Final line is gratitous reformatting.

I would leave the device *exactly* as is, ugly structure packing and
all.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] Add virtio-input driver.

2015-03-26 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Mar 26, 2015 at 11:49:25AM +0100, Gerd Hoffmann wrote:
 virtio-input is basically evdev-events-over-virtio, so this driver isn't
 much more than reading configuration from config space and forwarding
 incoming events to the linux input layer.
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com

 Still a bit worried about using input.h as host/guest
 interface (can't we use some formal standard, e.g. USB HID?),
 but I'll let Rusty decide that.

I like the simplicity, and this API is pretty well proven.

Since this is under drivers/virtio rather than drivers/input, I've
applied it to my tree.

I have a dream where Denys Vlasenko adds this and the virtio gl device
to lguest, and we have X running under lguest :)

Applied!
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4] Add virtio-input driver.

2015-03-25 Thread Rusty Russell
Vojtech Pavlik vojt...@suse.com writes:
 On Wed, Mar 25, 2015 at 01:51:43PM +1030, Rusty Russell wrote:
 Imagine a future virtio standard which incorporates this.  And a Windows
 or FreeBSD implementation of the device and or driver.  How ugly would
 they be?

 A windows translation layer is fairly easy, people porting software from
 Windows to Linux told me numerous times that adapting isn't hard. I also
 believe that at least one of the BSD's has a compatible implementation
 these days based on the fact that I was asked to allow copying the
 headers in the past.

Thanks Vojtech, that answers my questions.

I figure we apply this and see where it leads...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] virtio: change comment in transmit

2015-03-24 Thread Rusty Russell
Stephen Hemminger step...@networkplumber.org writes:
 The original comment was not really informative or funny
 as well as sexist. Replace it with a better explanation of
 why the driver does stop and what the impacts are.

 Signed-off-by: Stephen Hemminger step...@networkplumber.org

Fair call.  Comment was certainly snarky, probably sexist.  I think it
expressed my feelings perfectly, however.

I note that there's still no comment saying don't do this in
netdevice.h; I gather returning NETDEV_TX_BUSY is still considered a Bad
Thing?

(Does it really BUG_ON?)

Thanks,
Rusty.

 --- a/drivers/net/virtio_net.c2015-03-24 15:20:25.174671000 -0700
 +++ b/drivers/net/virtio_net.c2015-03-24 16:17:28.478525333 -0700
 @@ -939,8 +939,12 @@ static netdev_tx_t start_xmit(struct sk_
   skb_orphan(skb);
   nf_reset(skb);
  
 - /* Apparently nice girls don't return TX_BUSY; stop the queue
 -  * before it gets out of hand.  Naturally, this wastes entries. */
 + /* It is better to stop queue if running out of space
 +  * instead of forcing queuing layer to requeue the skb
 +  * by returning TX_BUSY (and cause a BUG message).
 +  * Since most packets only take 1 or 2 ring slots
 +  * this means 16 slots are typically wasted.
 +  */
   if (sq-vq-num_free  2+MAX_SKB_FRAGS) {
   netif_stop_subqueue(dev, qnum);
   if (unlikely(!virtqueue_enable_cb_delayed(sq-vq))) {
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4] Add virtio-input driver.

2015-03-24 Thread Rusty Russell
Gerd Hoffmann kra...@redhat.com writes:
 virtio-input is basically evdev-events-over-virtio, so this driver isn't
 much more than reading configuration from config space and forwarding
 incoming events to the linux input layer.

 Signed-off-by: Gerd Hoffmann kra...@redhat.com

Is the input layer sane?  I've never dealt with it, so I don't know.

What's it used for?

Imagine a future virtio standard which incorporates this.  And a Windows
or FreeBSD implementation of the device and or driver.  How ugly would
they be?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_mmio: fix endian-ness for mmio

2015-03-12 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Mar 12, 2015 at 12:33:36PM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  Going over the virtio mmio code, I noticed that it doesn't correctly
  return device config values in LE format when using virtio 1.0.
  Borrow code from virtio_pci_modern to do this correctly.
 
 AFAICT, it doesn't need to.  The endian correction is done by the
 callers.
 
 The only reason that virtio_pci_modern() does it is because readl() etc
 do endian conversion, and we don't want them to.  And the PCI part of
 the spec says to use natural accessors, so we don't do byte-at-a-time.
 
 Cheers,
 Rusty.

 You are right, the endina-ness is not an issue, so the commit log I
 wrote is wrong, but I still think the patch is required, because MMIO
 spec says the same as PCI.
   The driver MUST only use 32 bit wide and aligned reads and writes to
   access the control registers described
   in table 4.1. For the device-specific configuration space, the driver
   MUST use 8 bit wide accesses for 8 bit
   wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and
   32 bit wide and aligned accesses for
   32 and 64 bit wide fields.

Ah, good point.

I'm sure Pawel is on a beach somewhere sipping cocktails, so I'll apply
this immediately (with your updated commit message) to my
pending-rebases and give him the weekend to Ack.

Thanks,
Rusty.




 Here's a better commit log:

 ---

 virtio_mmio: fix access width for mmio

 Going over the virtio mmio code, I noticed that it doesn't correctly
 access modern device config values using natural accessors: it uses
 readb to get/set them byte by byte, while the virtio 1.0 spec explicitly 
 states:

   4.2.2.2 Driver Requirements: MMIO Device Register Layout

   ...

   The driver MUST only use 32 bit wide and aligned reads and writes to
   access the control registers described in table 4.1.
   For the device-specific configuration space, the driver MUST use
   8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned
   accesses for 16 bit wide fields and 32 bit wide and aligned accesses for
   32 and 64 bit wide fields.

 Borrow code from virtio_pci_modern to do this correctly.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 Makes sense now, right?
 Want me to repost or can you just tweak the commit log?


  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
 
  Note: untested: QEMU doesn't support virtio 1.0 for virtio-mmio.
  Pawel, could you please confirm this patch makes sense?
 
   drivers/virtio/virtio_mmio.c | 79 
  +++-
   1 file changed, 71 insertions(+), 8 deletions(-)
 
  diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
  index cad5698..0375456 100644
  --- a/drivers/virtio/virtio_mmio.c
  +++ b/drivers/virtio/virtio_mmio.c
  @@ -156,22 +156,85 @@ static void vm_get(struct virtio_device *vdev, 
  unsigned offset,
void *buf, unsigned len)
   {
 struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
  -  u8 *ptr = buf;
  -  int i;
  +  void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG;
  +  u8 b;
  +  __le16 w;
  +  __le32 l;
   
  -  for (i = 0; i  len; i++)
  -  ptr[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
  +  if (vm_dev-version == 1) {
  +  u8 *ptr = buf;
  +  int i;
  +
  +  for (i = 0; i  len; i++)
  +  ptr[i] = readb(base + offset + i);
  +  return;
  +  }
  +
  +  switch (len) {
  +  case 1:
  +  b = readb(base + offset);
  +  memcpy(buf, b, sizeof b);
  +  break;
  +  case 2:
  +  w = cpu_to_le16(readw(base + offset));
  +  memcpy(buf, w, sizeof w);
  +  break;
  +  case 4:
  +  l = cpu_to_le32(readl(base + offset));
  +  memcpy(buf, l, sizeof l);
  +  break;
  +  case 8:
  +  l = cpu_to_le32(readl(base + offset));
  +  memcpy(buf, l, sizeof l);
  +  l = cpu_to_le32(ioread32(base + offset + sizeof l));
  +  memcpy(buf + sizeof l, l, sizeof l);
  +  break;
  +  default:
  +  BUG();
  +  }
   }
   
   static void vm_set(struct virtio_device *vdev, unsigned offset,
const void *buf, unsigned len)
   {
 struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
  -  const u8 *ptr = buf;
  -  int i;
  +  void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG;
  +  u8 b;
  +  __le16 w;
  +  __le32 l;
   
  -  for (i = 0; i  len; i++)
  -  writeb(ptr[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
  +  if (vm_dev-version == 1) {
  +  const u8 *ptr = buf;
  +  int i;
  +
  +  for (i = 0; i  len; i++)
  +  writeb(ptr[i], base + offset + i);
  +
  +  return;
  +  }
  +
  +  switch (len) {
  +  case 1:
  +  memcpy(b, buf, sizeof b

Re: [PATCH v2 log fixed] virtio_mmio: fix endian-ness for mmio

2015-03-12 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Subject: [PATCH] virtio_mmio: fix access width for mmio

Just for the record:

Applied.

Thanks,
Rusty.

 Going over the virtio mmio code, I noticed that it doesn't correctly
 access modern device config values using natural accessors: it uses
 readb to get/set them byte by byte, while the virtio 1.0 spec explicitly 
 states:

 4.2.2.2 Driver Requirements: MMIO Device Register Layout

 ...

 The driver MUST only use 32 bit wide and aligned reads and writes to
 access the control registers described in table 4.1.
 For the device-specific configuration space, the driver MUST use
 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned
 accesses for 16 bit wide fields and 32 bit wide and aligned accesses 
 for
 32 and 64 bit wide fields.

 Borrow code from virtio_pci_modern to do this correctly.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 ---

 This is exactly the same as PATCH virtio_mmio: fix endian-ness for mmio
 but with corrected subject and commit log, to make
 it easier to apply.

 Note: untested: QEMU doesn't support virtio 1.0 for virtio-mmio,
 but seems pretty obvious. Let's apply ASAP so we don't ship
 incompliant drivers for virtio 1.0?

  drivers/virtio/virtio_mmio.c | 79 
 +++-
  1 file changed, 71 insertions(+), 8 deletions(-)

 diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
 index cad5698..0375456 100644
 --- a/drivers/virtio/virtio_mmio.c
 +++ b/drivers/virtio/virtio_mmio.c
 @@ -156,22 +156,85 @@ static void vm_get(struct virtio_device *vdev, unsigned 
 offset,
  void *buf, unsigned len)
  {
   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 - u8 *ptr = buf;
 - int i;
 + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG;
 + u8 b;
 + __le16 w;
 + __le32 l;
  
 - for (i = 0; i  len; i++)
 - ptr[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
 + if (vm_dev-version == 1) {
 + u8 *ptr = buf;
 + int i;
 +
 + for (i = 0; i  len; i++)
 + ptr[i] = readb(base + offset + i);
 + return;
 + }
 +
 + switch (len) {
 + case 1:
 + b = readb(base + offset);
 + memcpy(buf, b, sizeof b);
 + break;
 + case 2:
 + w = cpu_to_le16(readw(base + offset));
 + memcpy(buf, w, sizeof w);
 + break;
 + case 4:
 + l = cpu_to_le32(readl(base + offset));
 + memcpy(buf, l, sizeof l);
 + break;
 + case 8:
 + l = cpu_to_le32(readl(base + offset));
 + memcpy(buf, l, sizeof l);
 + l = cpu_to_le32(ioread32(base + offset + sizeof l));
 + memcpy(buf + sizeof l, l, sizeof l);
 + break;
 + default:
 + BUG();
 + }
  }
  
  static void vm_set(struct virtio_device *vdev, unsigned offset,
  const void *buf, unsigned len)
  {
   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 - const u8 *ptr = buf;
 - int i;
 + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG;
 + u8 b;
 + __le16 w;
 + __le32 l;
  
 - for (i = 0; i  len; i++)
 - writeb(ptr[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i);
 + if (vm_dev-version == 1) {
 + const u8 *ptr = buf;
 + int i;
 +
 + for (i = 0; i  len; i++)
 + writeb(ptr[i], base + offset + i);
 +
 + return;
 + }
 +
 + switch (len) {
 + case 1:
 + memcpy(b, buf, sizeof b);
 + writeb(b, base + offset);
 + break;
 + case 2:
 + memcpy(w, buf, sizeof w);
 + writew(le16_to_cpu(w), base + offset);
 + break;
 + case 4:
 + memcpy(l, buf, sizeof l);
 + writel(le32_to_cpu(l), base + offset);
 + break;
 + case 8:
 + memcpy(l, buf, sizeof l);
 + writel(le32_to_cpu(l), base + offset);
 + memcpy(l, buf + sizeof l, sizeof l);
 + writel(le32_to_cpu(l), base + offset + sizeof l);
 + break;
 + default:
 + BUG();
 + }
  }
  
  static u8 vm_get_status(struct virtio_device *vdev)
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] uapi/virtio_scsi: allow overriding CDB/SENSE size

2015-03-12 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Wed, Mar 11, 2015 at 02:19:03PM +0100, Michael S. Tsirkin wrote:
 QEMU wants to use virtio scsi structures with
 a different VIRTIO_SCSI_CDB_SIZE/VIRTIO_SCSI_SENSE_SIZE,
 let's add ifdefs to allow overriding them.
 
 Keep the old defines under new names:
 VIRTIO_SCSI_CDB_DEFAULT_SIZE/VIRTIO_SCSI_SENSE_DEFAULT_SIZE,
 since that's what these values really are:
 defaults for cdb/sense size fields.
 
 Suggested-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---

 Rusty, pls note, if possible I'd like this one
 merged for 4.0 because it's important for QEMU
 (which now imports linux headers).

OK, applied.

Thanks,
Rusty.




  include/uapi/linux/virtio_scsi.h | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)
 
 diff --git a/include/uapi/linux/virtio_scsi.h 
 b/include/uapi/linux/virtio_scsi.h
 index 42b9370..cc18ef8 100644
 --- a/include/uapi/linux/virtio_scsi.h
 +++ b/include/uapi/linux/virtio_scsi.h
 @@ -29,8 +29,16 @@
  
  #include linux/virtio_types.h
  
 -#define VIRTIO_SCSI_CDB_SIZE   32
 -#define VIRTIO_SCSI_SENSE_SIZE 96
 +/* Default values of the CDB and sense data size configuration fields */
 +#define VIRTIO_SCSI_CDB_DEFAULT_SIZE   32
 +#define VIRTIO_SCSI_SENSE_DEFAULT_SIZE 96
 +
 +#ifndef VIRTIO_SCSI_CDB_SIZE
 +#define VIRTIO_SCSI_CDB_SIZE VIRTIO_SCSI_CDB_DEFAULT_SIZE
 +#endif
 +#ifndef VIRTIO_SCSI_SENSE_SIZE
 +#define VIRTIO_SCSI_SENSE_SIZE VIRTIO_SCSI_SENSE_DEFAULT_SIZE
 +#endif
  
  /* SCSI command request, followed by data-out */
  struct virtio_scsi_cmd_req {
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device

2015-03-11 Thread Rusty Russell
Ohad Ben-Cohen o...@wizery.com writes:
 On Mon, Mar 9, 2015 at 10:41 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Sat, Mar 07, 2015 at 08:06:56PM +0100, Michael S. Tsirkin wrote:
 virtio spec requires that all drivers set DRIVER_OK
 before using devices. While rpmsg isn't yet
 included in the virtio 1 spec, previous spec versions
 also required this.

 virtio rpmsg violates this rule: is calls kick
 before setting DRIVER_OK.

 The fix isn't trivial since simply calling virtio_device_ready earlier
 would mean we might get an interrupt in parallel with adding buffers.

 Instead, split kick out to prepare+notify calls.  prepare before
 virtio_device_ready - when we know we won't get interrupts. notify right
 afterwards.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---

 Ohad, can you review and ack pls?

 Sure,

 Acked-by: Ohad Ben-Cohen o...@wizery.com

Applied.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-03-09 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Wed, 4 Mar 2015 11:25:56 +0100
 Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Mar 04, 2015 at 04:44:54PM +1030, Rusty Russell wrote:
  Michael S. Tsirkin m...@redhat.com writes:
   On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote:
   Thomas Huth th...@linux.vnet.ibm.com writes:
On Thu, 26 Feb 2015 11:50:42 +1030
Rusty Russell ru...@rustcorp.com.au wrote:
   
Thomas Huth th...@linux.vnet.ibm.com writes:
  Hi all,

 with the recent kernel 3.19, I get a kernel warning when I start my
 KVM guest on s390 with virtio balloon enabled:

The deeper problem is that virtio_ccw_get_config just silently fails 
on
OOM.

Neither get_config nor set_config are expected to fail.
   
AFAIK this is currently not a problem. According to
http://lwn.net/Articles/627419/ these kmalloc calls never
fail because they allocate less than a page.
   
   I strongly suggest you unlearn that fact.
   The fix for this is in two parts:
   
   1) Annotate using sched_annotate_sleep() and add a comment: we may spin
  a few times in low memory situations, but this isn't a high
  performance path.
   
   2) Handle get_config (and other) failure in some more elegant way.
   
   Cheers,
   Rusty.
  
   I agree, but I'd like to point out that even without kmalloc,
   on s390 get_config is blocking - it's waiting
   for a hardware interrupt.
  
   And it makes sense: config is not data path, I don't think
   we should spin there.
  
   So I think besides these two parts, we still need my two patches:
   virtio-balloon: do not call blocking ops when !TASK_RUNNING
  
  I prefer to annotate, over trying to fix this.
  
  Because it's not important.  We might spin a few times, but it's very
  unlikely, and it's certainly not performance critical.
  
  Thanks,
  Rusty.
  
  Subject: virtio_balloon: annotate possible sleep waiting for event.
  
  CCW (s390) does this.
  
  Reported-by: Thomas Huth th...@linux.vnet.ibm.com
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  
  diff --git a/drivers/virtio/virtio_balloon.c 
  b/drivers/virtio/virtio_balloon.c
  index 0413157f3b49..3f4d5acdbde0 100644
  --- a/drivers/virtio/virtio_balloon.c
  +++ b/drivers/virtio/virtio_balloon.c
  @@ -340,6 +340,15 @@ static int balloon(void *_vballoon)
 s64 diff;
   
 try_to_freeze();
  +
  +  /*
  +   * Reading the config on the ccw backend involves an
  +   * allocation, so we may actually sleep and have an
  +   * extra iteration.  It's extremely unlikely,
 
 Hmm, this part of the comment seems wrong to me.
 Reading the config on the ccw backend always sleeps
 because it's interrupt driven.

 (...)

 So I suspect
 http://mid.gmane.org/1424874878-17155-1-git-send-email-...@redhat.com
 is better.
 
 What do you think?

 I'd prefer to fix this as well. While the I/O request completes
 instantly on current qemu (the ssch backend handles the start function
 immediately, not asynchronously as on real hardware), this (a) is an
 implementation detail that may change and (b) doesn't account for the
 need to deliver the interrupt to the guest - which might take non-zero
 time.

Ah, I see.  My mistake.

I've thrown out my patch, applied that one.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio-balloon: do not call blocking ops when !TASK_RUNNING

2015-03-09 Thread Rusty Russell
Thomas Huth th...@linux.vnet.ibm.com writes:
 On Wed, 25 Feb 2015 16:11:27 +0100
 Cornelia Huck cornelia.h...@de.ibm.com wrote:

 On Wed, 25 Feb 2015 15:36:02 +0100
 Michael S. Tsirkin m...@redhat.com wrote:
 
  virtio balloon has this code:
  wait_event_interruptible(vb-config_change,
   (diff = towards_target(vb)) != 0
   || vb-need_stats_update
   || kthread_should_stop()
   || freezing(current));
  
  Which is a problem because towards_target() call might block after
  wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing
  the task_struct::state collision typical of nesting of sleeping
  primitives
  
  See also http://lwn.net/Articles/628628/ or Thomas's
  bug report
  http://article.gmane.org/gmane.linux.kernel.virtualization/24846
  for a fuller explanation.
  
  To fix, rewrite using wait_woken.
  
  Cc: sta...@vger.kernel.org
  Reported-by: Thomas Huth th...@linux.vnet.ibm.com
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  changes from v1:
 remove wait_event_interruptible
 noticed by Cornelia Huck cornelia.h...@de.ibm.com
  
   drivers/virtio/virtio_balloon.c | 19 ++-
   1 file changed, 14 insertions(+), 5 deletions(-)
  
 
 I was able to reproduce Thomas' original problem and can confirm that
 it is gone with this patch.
 
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

 Right, I just applied the patch on my system, too, and the problem is
 indeed gone! Thanks for the quick fix!

 Tested-by: Thomas Huth th...@linux.vnet.ibm.com

Applied.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_balloon: set DRIVER_OK before using device

2015-03-09 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Mar 05, 2015 at 01:24:47PM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  virtio spec requires that all drivers set DRIVER_OK
  before using devices. While balloon isn't yet
  included in the virtio 1 spec, previous spec versions
  also required this.
 
  virtio balloon might violate this rule: probe calls
  kthread_run before setting DRIVER_OK, which might run
  immediately and cause balloon to inflate/deflate.
 
  To fix, call virtio_device_ready before running the kthread.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Replied, CC'd stable.

 Did you mean applied?

I stand erected.

Franks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device

2015-03-09 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio spec requires that all drivers set DRIVER_OK
 before using devices. While rpmsg isn't yet
 included in the virtio 1 spec, previous spec versions
 also required this.

 virtio rpmsg violates this rule: is calls kick
 before setting DRIVER_OK.

 The fix isn't trivial since simply calling virtio_device_ready earlier
 would mean we might get an interrupt in parallel with adding buffers.

 Instead, split kick out to prepare+notify calls.  prepare before
 virtio_device_ready - when we know we won't get interrupts. notify right
 afterwards.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Applied.  I'll wait for Ohad to ack before sending to Linus.

BTW I assume you have a version of qemu which warns on these kind of
failures?  That'd be nice to have!

Thanks,
Rusty.

 ---

 Note: compile-tested only.

  drivers/rpmsg/virtio_rpmsg_bus.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

 diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
 b/drivers/rpmsg/virtio_rpmsg_bus.c
 index 92f6af6..73354ee 100644
 --- a/drivers/rpmsg/virtio_rpmsg_bus.c
 +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
 @@ -951,6 +951,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
   void *bufs_va;
   int err = 0, i;
   size_t total_buf_space;
 + bool notify;
  
   vrp = kzalloc(sizeof(*vrp), GFP_KERNEL);
   if (!vrp)
 @@ -1030,8 +1031,22 @@ static int rpmsg_probe(struct virtio_device *vdev)
   }
   }
  
 + /*
 +  * Prepare to kick but don't notify yet - we can't do this before
 +  * device is ready.
 +  */
 + notify = virtqueue_kick_prepare(vrp-rvq);
 +
 + /* From this point on, we can notify and get callbacks. */
 + virtio_device_ready(vdev);
 +
   /* tell the remote processor it can start sending messages */
 - virtqueue_kick(vrp-rvq);
 + /*
 +  * this might be concurrent with callbacks, but we are only
 +  * doing notify, not a full kick here, so that's ok.
 +  */
 + if (notify)
 + virtqueue_notify(vrp-rvq);
  
   dev_info(vdev-dev, rpmsg host is online\n);
  
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio fixes pull for 4.0?

2015-03-09 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Hi Rusty!
 There are a bunch of (mostly virtio 1.0 related) fixes for virtio
 that need to go into 4.0 I think.
   virtio_blk: typo fix
   virtio_blk: fix comment for virtio 1.0

OK, I've added these two.  I tend to be overcautious after the merge
window.

   virtio_console: init work unconditionally
   virtio_console: avoid config access from irq
   virtio_balloon: set DRIVER_OK before using device

 seem ready?

These are in my virtio-next tree already.  

   virtio_mmio: generation support
   virtio_mmio: fix endian-ness for mmio these two are waiting for ack by 
 Pawel

 These two fix bugs in virtio 1.0 code for mmio.
 Host code for that was AFAIK not posted, so I can't test properly.
 Pawel?

I'm waiting on Acks for these two.

   virtio-balloon: do not call blocking ops when !TASK_RUNNING

 Rusty, it seems you prefer a different fix for this issue,
 while Cornelia prefers mine. Maybe both me and Cornelia misunderstand the
 issue? I know you dealt with a ton of similar issues recently
 so you are more likely to be right, but I'd like to understand
 what's going on better all the same. Could you help please?

In the longer run, we should handle failures from these callbacks.  But
we don't need to do that right now.  So we want the minimal fix.

And an annotation is the minimal fix.  The bug has been there for ages;
it's just the warning that is new (if we *always* slept, we would
spin, but in practice we'll rarely sleep).

   virtio_rpmsg: set DRIVER_OK before using device

 Just posted this, but seems pretty obvious.

Yep, I've applied this too.  Thanks!

 I think it's a good idea to merge these patches (maybe except the
 !TASK_RUNNING thing) sooner rather than later, to make sure people have
 the time to test the fixes properly.  Would you like me to pack up (some
 of them) them up and do a pull request?

I'm waiting a bit longer, since they seem to still be tricking in.

I'm still chasing a QEMU bug, where it seems to fill in a number too
large in the 'len' field for a block device.  It should be 1 byte for a
block device write, for example.  See patch which causes assert() in
qemu, but I had to stop at that point (should get back tomorrow I hope).

Thanks,
Rusty.

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 882a31b..98e99b8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -243,16 +243,21 @@ int virtio_queue_empty(VirtQueue *vq)
 }
 
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
-unsigned int len, unsigned int idx)
+unsigned int len_written, unsigned int idx)
 {
-unsigned int offset;
+unsigned int offset, tot_wlen;
 int i;
 
-trace_virtqueue_fill(vq, elem, len, idx);
+trace_virtqueue_fill(vq, elem, len_written, idx);
+
+for (tot_wlen = i = 0; i  elem-out_num; i++) {
+tot_wlen += elem-out_sg[i].iov_len;
+}
+assert(len_written = tot_wlen);
 
 offset = 0;
 for (i = 0; i  elem-in_num; i++) {
-size_t size = MIN(len - offset, elem-in_sg[i].iov_len);
+size_t size = MIN(len_written - offset, elem-in_sg[i].iov_len);
 
 cpu_physical_memory_unmap(elem-in_sg[i].iov_base,
   elem-in_sg[i].iov_len,
@@ -270,7 +275,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 
 /* Get a pointer to the next entry in the used ring. */
 vring_used_ring_id(vq, idx, elem-index);
-vring_used_ring_len(vq, idx, len);
+vring_used_ring_len(vq, idx, len_written);
 }
 
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
@@ -288,9 +293,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 }
 
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
-unsigned int len)
+unsigned int len_written)
 {
-virtqueue_fill(vq, elem, len, 0);
+virtqueue_fill(vq, elem, len_written, 0);
 virtqueue_flush(vq, 1);
 }
 
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index df09993..153374f 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -191,10 +191,10 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 void virtio_del_queue(VirtIODevice *vdev, int n);
 
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
-unsigned int len);
+unsigned int len_written);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
-unsigned int len, unsigned int idx);
+unsigned int len_written, unsigned int idx);
 
 void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
 size_t num_sg, int is_write);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH] virtio_balloon: set DRIVER_OK before using device

2015-03-04 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio spec requires that all drivers set DRIVER_OK
 before using devices. While balloon isn't yet
 included in the virtio 1 spec, previous spec versions
 also required this.

 virtio balloon might violate this rule: probe calls
 kthread_run before setting DRIVER_OK, which might run
 immediately and cause balloon to inflate/deflate.

 To fix, call virtio_device_ready before running the kthread.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Replied, CC'd stable.

Thanks,
Rusty.


 ---
  drivers/virtio/virtio_balloon.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index 5a6ad6d..6a356e3 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -508,6 +508,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
   if (err  0)
   goto out_oom_notify;
  
 + virtio_device_ready(vdev);
 +
   vb-thread = kthread_run(balloon, vb, vballoon);
   if (IS_ERR(vb-thread)) {
   err = PTR_ERR(vb-thread);
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0

2015-03-04 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Wed, Mar 04, 2015 at 11:06:08AM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  virtio 1.0 config space is in LE format for all
  devices, use modern wrappers when accessed through
  the 1.0 BAR.
 
 Hmm, I'm not so sure about these patches.  It's easy to miss the
 existence of the _modern variants, and they're 90% the same as the
 legacy variants.
 
 But as long as it's fixed...
 
 Thanks,
 Rusty.
 PS.  rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE 
 host).

 Hmm good point. What if I'll rework this to get bool legacy parameter?

The functions have access to vdev, so they can determine that already.

Simply renaming the old version to _legacy would be enough.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-03-03 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote:
 Thomas Huth th...@linux.vnet.ibm.com writes:
  On Thu, 26 Feb 2015 11:50:42 +1030
  Rusty Russell ru...@rustcorp.com.au wrote:
 
  Thomas Huth th...@linux.vnet.ibm.com writes:
Hi all,
  
   with the recent kernel 3.19, I get a kernel warning when I start my
   KVM guest on s390 with virtio balloon enabled:
  
  The deeper problem is that virtio_ccw_get_config just silently fails on
  OOM.
  
  Neither get_config nor set_config are expected to fail.
 
  AFAIK this is currently not a problem. According to
  http://lwn.net/Articles/627419/ these kmalloc calls never
  fail because they allocate less than a page.
 
 I strongly suggest you unlearn that fact.
 The fix for this is in two parts:
 
 1) Annotate using sched_annotate_sleep() and add a comment: we may spin
a few times in low memory situations, but this isn't a high
performance path.
 
 2) Handle get_config (and other) failure in some more elegant way.
 
 Cheers,
 Rusty.

 I agree, but I'd like to point out that even without kmalloc,
 on s390 get_config is blocking - it's waiting
 for a hardware interrupt.

 And it makes sense: config is not data path, I don't think
 we should spin there.

 So I think besides these two parts, we still need my two patches:
 virtio-balloon: do not call blocking ops when !TASK_RUNNING

I prefer to annotate, over trying to fix this.

Because it's not important.  We might spin a few times, but it's very
unlikely, and it's certainly not performance critical.

Thanks,
Rusty.

Subject: virtio_balloon: annotate possible sleep waiting for event.

CCW (s390) does this.

Reported-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0413157f3b49..3f4d5acdbde0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -340,6 +340,15 @@ static int balloon(void *_vballoon)
s64 diff;
 
try_to_freeze();
+
+   /*
+* Reading the config on the ccw backend involves an
+* allocation, so we may actually sleep and have an
+* extra iteration.  It's extremely unlikely, and this
+* isn't a fast path in any sense.
+*/
+   sched_annotate_sleep();
+
wait_event_interruptible(vb-config_change,
 (diff = towards_target(vb)) != 0
 || vb-need_stats_update
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0

2015-03-03 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio 1.0 config space is in LE format for all
 devices, use modern wrappers when accessed through
 the 1.0 BAR.

Hmm, I'm not so sure about these patches.  It's easy to miss the
existence of the _modern variants, and they're 90% the same as the
legacy variants.

But as long as it's fixed...

Thanks,
Rusty.
PS.  rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE host).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_console: avoid config access from irq

2015-03-03 Thread Rusty Russell
Amit Shah amit.s...@redhat.com writes:
 On (Sat) 28 Feb 2015 [18:42:25], Michael S. Tsirkin wrote:
 when multiport is off, virtio console invokes config access from irq
 context, config access is blocking on s390.
 Fix this up by scheduling work from config irq - similar to what we do
 for multiport configs.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 Reviewed-by: Amit Shah amit.s...@redhat.com

Applied, thanks.

Should these be CC: stable?

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_console: init work unconditionally

2015-03-03 Thread Rusty Russell
Amit Shah amit.s...@redhat.com writes:
 On (Sat) 28 Feb 2015 [18:41:34], Michael S. Tsirkin wrote:
 when multiport is off, we don't initialize config work,
 but we then cancel uninitialized control_work on freeze.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 Reviewed-by: Amit Shah amit.s...@redhat.com

Thanks, applied.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_blk: fix comment for virtio 1.0

2015-03-02 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Fix up comment to match virtio 1.0 logic:
 virtio_blk_outhdr isn't the first elements anymore,
 the only requirement is that it comes first in
 the s/g list.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Thanks, both applied.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-03-02 Thread Rusty Russell
Thomas Huth th...@linux.vnet.ibm.com writes:
 On Thu, 26 Feb 2015 11:50:42 +1030
 Rusty Russell ru...@rustcorp.com.au wrote:

 Thomas Huth th...@linux.vnet.ibm.com writes:
   Hi all,
 
  with the recent kernel 3.19, I get a kernel warning when I start my
  KVM guest on s390 with virtio balloon enabled:
 
 The deeper problem is that virtio_ccw_get_config just silently fails on
 OOM.
 
 Neither get_config nor set_config are expected to fail.

 AFAIK this is currently not a problem. According to
 http://lwn.net/Articles/627419/ these kmalloc calls never
 fail because they allocate less than a page.

I strongly suggest you unlearn that fact.

The fix for this is in two parts:

1) Annotate using sched_annotate_sleep() and add a comment: we may spin
   a few times in low memory situations, but this isn't a high
   performance path.

2) Handle get_config (and other) failure in some more elegant way.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-02-25 Thread Rusty Russell
Thomas Huth th...@linux.vnet.ibm.com writes:
  Hi all,

 with the recent kernel 3.19, I get a kernel warning when I start my
 KVM guest on s390 with virtio balloon enabled:

The deeper problem is that virtio_ccw_get_config just silently fails on
OOM.

Neither get_config nor set_config are expected to fail.

Cornelia, I think ccw and config_area should be allocated inside vcdev.
You could either use pointers, or simply allocate vcdev with GDP_DMA.

This would avoid the kmalloc inside these calls.

Thanks,
Rusty.


 [0.839687] do not call blocking ops when !TASK_RUNNING; state=1 set at
[00174a1e] prepare_to_wait_event+0x7e/0x108
 [0.839694] [ cut here ]
 [0.839697] WARNING: at kernel/sched/core.c:7326
 [0.839698] Modules linked in:
 [0.839702] CPU: 0 PID: 46 Comm: vballoon Not tainted 3.19.0 #233
 [0.839705] task: 021d ti: 021d8000 task.ti: 
 021d8000
 [0.839707] Krnl PSW : 0704c0018000 0015bf8e 
 (__might_sleep+0x8e/0x98)
 [0.839713]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
 EA:3
 Krnl GPRS: 000d 021d 0071 0001
 [0.839718]00675ace 01998c50  
 
 [0.839720]00982134 0058f824 00a008a8 
 
 [0.839722]04d9 007ea992 0015bf8a 
 021dbc28
 [0.839731] Krnl Code: 0015bf7e: c0200033e838  larl
 %r2,7d8fee
0015bf84: c0e50028cd62 brasl   %r14,675a48
   #0015bf8a: a7f40001 brc 15,15bf8c
   0015bf8e: 9201a000 mvi 0(%r10),1
0015bf92: a7f4ffe2 brc 15,15bf56
0015bf96: 0707 bcr 0,%r7
0015bf98: ebdff0800024 stmg%r13,%r15,128(%r15)
0015bf9e: a7f13fe0 tmll%r15,16352
 [0.839749] Call Trace:
 [0.839751] ([0015bf8a] __might_sleep+0x8a/0x98)
 [0.839756]  [0028a562] __kmalloc+0x272/0x350
 [0.839759]  [0058f824] virtio_ccw_get_config+0x3c/0x100
 [0.839762]  [0049fcb0] balloon+0x1b8/0x330
 [0.839765]  [001529c8] kthread+0x120/0x138
 [0.839767]  [00683c22] kernel_thread_starter+0x6/0xc
 [0.839770]  [00683c1c] kernel_thread_starter+0x0/0xc
 [0.839772] no locks held by vballoon/46.
 [0.839773] Last Breaking-Event-Address:
 [0.839776]  [0015bf8a] __might_sleep+0x8a/0x98
 [0.839778] ---[ end trace d27fcdfa27273d7c ]---

 The problem seems to be this code in balloon() in
 drivers/virtio/virtio_balloon.c:

   wait_event_interruptible(vb-config_change,
(diff = towards_target(vb)) != 0
|| vb-need_stats_update
|| kthread_should_stop()
|| freezing(current));

 wait_event_interruptible() sets the state of the current task to
 TASK_INTERRUPTIBLE, then checks the condition. The condition contains
 towards_target() which reads the virtio config space via virtio_cread().
 On s390, this then triggers virtio_ccw_get_config() - and this function
 calls some other functions again that might sleep (e.g. kzalloc or
 wait_event in ccw_io_helper) ... and this causes the new kernel warning
 message with kernel 3.19.

 I think it would be quite difficult or at least ugly to rewrite
 virtio_ccw_get_config() so that it does not call sleepable functions
 anymore. So would it be feasible to rewrite the balloon() function that
 it does not call the towards_target() in its wait_event condition
 anymore? I am unfortunately not that familiar with the balloon code
 semantics, so any help is very appreciated here!

  Thanks,
   Thomas
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Qemu and virtio 1.0

2015-02-24 Thread Rusty Russell
OK, I am trying to experiment with virtio 1.0 support using the
latest kernel and MST's qemu tree:

https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/?h=virtio-1.0

The first issue is that the device config endian was wrong (see
attached patch).

I'm now setting up a BE guest on my x86 laptop, and a BE and LE guest
on a BE powerpc machine, to check that all combinations work correctly.
If others test too, that would be appreciated!

Cheers,
Rusty.

From 95ac91554ed602f856a2a5fcc25eaffcad1b1c8d Mon Sep 17 00:00:00 2001
From: Rusty Russell ru...@rustcorp.com.au
Date: Tue, 24 Feb 2015 14:47:44 +1030
Subject: [PATCH] virtio_config_write*/virtio_config_read*: Don't endian swap
 for virtio 1.0.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 079944c..882a31b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -662,7 +662,12 @@ uint32_t virtio_config_readw(VirtIODevice *vdev, uint32_t 
addr)
 
 k-get_config(vdev, vdev-config);
 
-val = lduw_p(vdev-config + addr);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+val = lduw_le_p(vdev-config + addr);
+} else {
+val = lduw_p(vdev-config + addr);
+}
 return val;
 }
 
@@ -677,7 +682,12 @@ uint32_t virtio_config_readl(VirtIODevice *vdev, uint32_t 
addr)
 
 k-get_config(vdev, vdev-config);
 
-val = ldl_p(vdev-config + addr);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+val = ldl_le_p(vdev-config + addr);
+} else {
+val = ldl_p(vdev-config + addr);
+}
 return val;
 }
 
@@ -706,7 +716,12 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 return;
 }
 
-stw_p(vdev-config + addr, val);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+stw_le_p(vdev-config + addr, val);
+} else {
+stw_p(vdev-config + addr, val);
+}
 
 if (k-set_config) {
 k-set_config(vdev, vdev-config);
@@ -722,7 +737,12 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 return;
 }
 
-stl_p(vdev-config + addr, val);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+stl_le_p(vdev-config + addr, val);
+} else {
+stl_p(vdev-config + addr, val);
+}
 
 if (k-set_config) {
 k-set_config(vdev, vdev-config);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PULL] virtio-next

2015-02-17 Thread Rusty Russell
The following changes since commit b97f880c8342fd6e49a02c9ef7507a678722b2b3:

  Merge branch 'for-3.19-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata (2015-01-21 07:54:16 
+1200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git 
tags/virtio-next-for-linus

for you to fetch changes up to 5b40a7daf51812b35cf05d1601a779a7043f8414:

  virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. (2015-02-17 16:19:29 +1030)


OK, this has the big virtio 1.0 implementation, as specified by OASIS.

On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to
double-check the implementation.

Then comes the inevitable fixes and cleanups from that work.

Thanks,
Rusty.


Luis R. Rodriguez (1):
  virtual: Documentation: simplify and generalize paravirt_ops.txt

Michael S. Tsirkin (23):
  virtio_pci: drop virtio_config dependency
  virtio/9p: verify device has config space
  virtio/blk: verify device has config space
  virtio/console: verify device has config space
  virtio/net: verify device has config space
  virtio/scsi: verify device has config space
  virtio/balloon: verify device has config space
  mn10300: drop dead code
  pci: add pci_iomap_range
  s390: add pci_iomap_range
  virtio_pci: move probe/remove code to common
  virtio_pci: modern driver
  virtio_pci_modern: reduce number of mappings
  virtio_pci_modern: support devices with no config
  virtio_balloon: coding style fixes
  virtio_blk: coding style fixes
  virtio_ring: coding style fix
  virtio_rng: drop extra empty line
  virtio_pci: Kconfig grammar fix
  virtio_pci: drop Kconfig warnings
  virtio_pci: add an option to disable legacy driver
  virtio_pci: add module param to force legacy mode
  virtio_pci_modern: drop an unused function

Pawel Moll (1):
  virtio-mmio: Update the device to OASIS spec version

Rusty Russell (53):
  virtio-pci: define layout for virtio 1.0
  virtio_pci: macros for PCI layout offsets
  virtio: define VIRTIO_PCI_CAP_PCI_CFG in header.
  virtio: Don't expose legacy block features when VIRTIO_BLK_NO_LEGACY 
defined.
  virtio: Don't expose legacy config features when VIRTIO_CONFIG_NO_LEGACY 
defined.
  virtio_pci: use 16-bit accessor for queue_enable.
  virtio: don't require a config space on the console device.
  lguest: have --rng read from /dev/urandom not /dev/random.
  lguest: add operations to get/set a register from the Launcher.
  lguest: write more information to userspace about pending traps.
  lguest: add infrastructure for userspace to deliver a trap to the guest.
  lguest: add infrastructure to check mappings.
  lguest: send trap 13 through to userspace.
  lguest: suppress PS/2 keyboard polling.
  lguest: don't disable iospace.
  lguest: add iomem region, where guest page faults get sent to userspace.
  lguest: disable ACPI explicitly.
  lguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC
  lguest: add MMIO region allocator in example launcher.
  lguest: decode mmio accesses for PCI in example launcher.
  lguest: add PCI config space emulation to example launcher.
  lguest: implement virtio-PCI MMIO accesses.
  lguest: fix failure to find linux/virtio_types.h
  lguest: add a dummy PCI host bridge.
  lguest: Convert block device to virtio 1.0 PCI.
  lguest: Convert net device to virtio 1.0 PCI.
  lguest: Convert entropy device to virtio 1.0 PCI.
  lguest: Convert console device to virtio 1.0 PCI.
  lguest: define VIRTIO_CONFIG_NO_LEGACY in example launcher.
  lguest: remove support for lguest bus.
  lguest: remove support for lguest bus in demonstration launcher.
  lguest: remove lguest bus definitions from header.
  lguest: support emerg_wr in console device in example launcher.
  lguest: support backdoor window.
  lguest: always put console in PCI slot #1.
  lguest: use the PCI console device's emerg_wr for early boot messages.
  lguest: remove NOTIFY facility from demonstration launcher.
  lguest: remove NOTIFY call and eventfd facility.
  tools/lguest: handle device reset correctly in example launcher.
  tools/lguest: fix features_accepted logic in example launcher.
  tools/lguest: rename virtio_pci_cfg_cap field to match spec.
  tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over 
PCI)
  tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over 
PCI)
  tools/lguest: handle indirect partway through chain.
  tools/lguest: don't start devices until DRIVER_OK status set.
  lguest: don't look in console features to find emerg_wr.
  tools/lguest: more documentation

Re: [PATCH 1/2] virtio_pci_modern: type-safe io accessors

2015-02-15 Thread Rusty Russell
Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:
 The spec is very clear on this:

 4.1.3.1 Driver Requirements: PCI Device Layout

 The driver MUST access each field using the “natural” access method,
 i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
 fields and 8-bit accesses for 8-bit fields.

 Add type-safe wrappers to prevent access with incorrect width.

 Applied both (for *next* merge window).

... and fixed this:

 +static inline void vp_iowrite32(u32 value, u32 __iomem *addr)
 +{
 +iowrite16(value, addr);
 +}

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support

2015-02-15 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Fri, 13 Feb 2015 13:22:09 +1030
 Rusty Russell ru...@rustcorp.com.au wrote:

 Michael S. Tsirkin m...@redhat.com writes:
  On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
  Jason Wang jasow...@redhat.com writes:
   This patch enables the interrupt coalescing setting through ethtool.
  
  The problem is that there's nothing network specific about interrupt
  coalescing.  I can see other devices wanting exactly the same thing,
  which means we'd deprecate this in the next virtio standard.
  
  I think the right answer is to extend like we did with
  vring_used_event(), eg:
  
  1) Add a new feature VIRTIO_F_RING_COALESCE.
  2) Add another a 32-bit field after vring_used_event(), eg:
  #define vring_used_delay(vr) (*(u32 
  *)((vr)-avail-ring[(vr)-num + 2]))
  
  This loses the ability to coalesce by number of frames, but we can still
  do number of sg entries, as we do now with used_event, and we could
  change virtqueue_enable_cb_delayed() to take a precise number if we
  wanted.
 
  But do we expect delay to be update dynamically?
  If not, why not stick it in config space?
 
 Hmm, we could update it dynamically (and will, in the case of ethtool).
 But it won't be common, so we could append a field to
 virtio_pci_common_cfg for PCI.
 
 I think MMIO and CCW would be easy to extend too, but CC'd to check.

 If this is a simple extension of the config space, it should just work
 for ccw (the Linux guest driver currently uses 0x100 as max config
 space size, which I grabbed from pci at the time I wrote it).

 But looking at this virtio_pci_common_cfg stuff, it seems to contain a
 lot of things that are handled via ccws on virtio-ccw (like number of
 queues or device status). Having an extra ccw just for changing this
 delay value seems like overkill.

Yes, possibly.

 On the basic topic of interrupt coalescing: With adapter interrupts,
 virtio-ccw already has some kind of coalescing: The summary indicator
 is set just once and an interrupt is made pending, then individual
 queue indicators are switched on and no further interrupt is generated
 if the summary indicator has not been cleared by the guest yet. I'm not
 sure how it would be different if an individual queue indicator is
 switched on later. Chances are that the guest code processing the
 indicators has not even yet processed to that individual indicator, so
 it wouldn't matter if it was set delayed. It is probably something that
 has to be tried out.

The network driver will do this at the virtio level too: no more rx
interrupts will be received until all packets have been processed.

But it is particularly useful for network transmit interrupts: we want
to be notified of the packet's finishing, but a little delay (hence more
batching) is better.  For rx, I can envision a case where the guest is
too fast and thus keeps getting interrupted after each packet.  A user
might decide to trade off some latency to increase batching; seems
like a bit like a benchmark hack to me, though...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support

2015-02-15 Thread Rusty Russell
Pawel Moll pawel.m...@arm.com writes:
 On Fri, 2015-02-13 at 02:52 +, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
  Jason Wang jasow...@redhat.com writes:
   This patch enables the interrupt coalescing setting through ethtool.
  
  The problem is that there's nothing network specific about interrupt
  coalescing.  I can see other devices wanting exactly the same thing,
  which means we'd deprecate this in the next virtio standard.
  
  I think the right answer is to extend like we did with
  vring_used_event(), eg:
  
  1) Add a new feature VIRTIO_F_RING_COALESCE.
  2) Add another a 32-bit field after vring_used_event(), eg:
  #define vring_used_delay(vr) (*(u32 
  *)((vr)-avail-ring[(vr)-num + 2]))
  
  This loses the ability to coalesce by number of frames, but we can still
  do number of sg entries, as we do now with used_event, and we could
  change virtqueue_enable_cb_delayed() to take a precise number if we
  wanted.
 
  But do we expect delay to be update dynamically?
  If not, why not stick it in config space?
 
 Hmm, we could update it dynamically (and will, in the case of ethtool).
 But it won't be common, so we could append a field to
 virtio_pci_common_cfg for PCI.
 
 I think MMIO and CCW would be easy to extend too, but CC'd to check.

 As far as I understand the virtio_pci_common_cfg principle (just had a
 look, for the first time ;-), it's now an equivalent of the MMIO control
 registers block. I see no major problem with adding another one.

OK, thanks.

 Or were you thinking about introducing some standard for the real
 config space? (fine with me as well - the transport will have nothing to
 do :-)

No, that'd not be possible at this point.  I think it's a per-transport
decision.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virtio_pci_modern: type-safe io accessors

2015-02-15 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 The spec is very clear on this:

 4.1.3.1 Driver Requirements: PCI Device Layout

 The driver MUST access each field using the “natural” access method,
 i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
 fields and 8-bit accesses for 8-bit fields.

 Add type-safe wrappers to prevent access with incorrect width.

Applied both (for *next* merge window).

Thanks,
Rusty.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/virtio/virtio_pci_modern.c | 37 +
  1 file changed, 37 insertions(+)

 diff --git a/drivers/virtio/virtio_pci_modern.c 
 b/drivers/virtio/virtio_pci_modern.c
 index 2aa38e5..4e73ef7 100644
 --- a/drivers/virtio/virtio_pci_modern.c
 +++ b/drivers/virtio/virtio_pci_modern.c
 @@ -20,6 +20,43 @@
  #define VIRTIO_PCI_NO_LEGACY
  #include virtio_pci_common.h
  
 +/*
 + * Type-safe wrappers for io accesses.
 + * Use these to enforce at compile time the following spec requirement:
 + *
 + * The driver MUST access each field using the “natural” access
 + * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses
 + * for 16-bit fields and 8-bit accesses for 8-bit fields.
 + */
 +static inline u8 vp_ioread8(u8 __iomem *addr)
 +{
 + return ioread8(addr);
 +}
 +static inline u16 vp_ioread16 (u16 __iomem *addr)
 +{
 + return ioread16(addr);
 +}
 +
 +static inline u32 vp_ioread32(u32 __iomem *addr)
 +{
 + return ioread32(addr);
 +}
 +
 +static inline void vp_iowrite8(u8 value, u8 __iomem *addr)
 +{
 + iowrite8(value, addr);
 +}
 +
 +static inline void vp_iowrite16(u16 value, u16 __iomem *addr)
 +{
 + iowrite16(value, addr);
 +}
 +
 +static inline void vp_iowrite32(u32 value, u32 __iomem *addr)
 +{
 + iowrite16(value, addr);
 +}
 +
  static void __iomem *map_capability(struct pci_dev *dev, int off,
   size_t minlen,
   u32 align,
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.

2015-02-12 Thread Rusty Russell
I noticed this with the console device.  It's not *wrong*, just a bit
weird.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index b9f70dfc4751..5ce2aa48fc6e 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -236,7 +236,10 @@ static int virtio_dev_probe(struct device *_d)
if (err)
goto err;
 
-   add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+   /* If probe didn't do it, mark device DRIVER_OK ourselves. */
+   if (!(dev-config-get_status(dev)  VIRTIO_CONFIG_S_DRIVER_OK))
+   virtio_device_ready(dev);
+
if (drv-scan)
drv-scan(dev);
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support

2015-02-12 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote:
 Jason Wang jasow...@redhat.com writes:
  This patch enables the interrupt coalescing setting through ethtool.
 
 The problem is that there's nothing network specific about interrupt
 coalescing.  I can see other devices wanting exactly the same thing,
 which means we'd deprecate this in the next virtio standard.
 
 I think the right answer is to extend like we did with
 vring_used_event(), eg:
 
 1) Add a new feature VIRTIO_F_RING_COALESCE.
 2) Add another a 32-bit field after vring_used_event(), eg:
 #define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 
 2]))
 
 This loses the ability to coalesce by number of frames, but we can still
 do number of sg entries, as we do now with used_event, and we could
 change virtqueue_enable_cb_delayed() to take a precise number if we
 wanted.

 But do we expect delay to be update dynamically?
 If not, why not stick it in config space?

Hmm, we could update it dynamically (and will, in the case of ethtool).
But it won't be common, so we could append a field to
virtio_pci_common_cfg for PCI.

I think MMIO and CCW would be easy to extend too, but CC'd to check.

 My feeling is that this should be a v1.0-only feature though
 (eg. feature bit 33).

 Yes, e.g. we can't extend config space for legacy virtio pci.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtual: Documentation: simplify and generalize paravirt_ops.txt

2015-02-11 Thread Rusty Russell
Luis R. Rodriguez mcg...@do-not-panic.com writes:
 From: Luis R. Rodriguez mcg...@suse.com

 The general documentation we have for pv_ops is currenty present
 on the IA64 docs, but since this documentation covers IA64 xen
 enablement and IA64 Xen support got ripped out a while ago
 through commit d52eefb47 present since v3.14-rc1 lets just
 simplify, generalize and move the pv_ops documentation to a
 shared place.

OK, I've applied this.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] virtio: Introducing virtio_pci.no_mmio, the worst boot option in history.

2015-02-10 Thread Rusty Russell
Sometimes, devices are just too damn fast.  Wouldn't it be nice if we
could (1) have an option to access them in the most baroque way
possible, and (2) ensure that even the normal case caused extra bloat?

Well, wish no longer: This patch does exactly that!  Since every
complaint virtio 1.0 device has to have a capability to allow backdoor
access into the BARs, we can use that:

4.1.4.7.1 Device Requirements: PCI configuration access capability

The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG
capability.

Upon detecting driver write access to pci_cfg_data, the device
MUST execute a write access at offset cap.offset at BAR
selected by cap.bar using the first cap.length bytes from
pci_cfg_data.

Upon detecting driver read access to pci_cfg_data, the device
MUST execute a read access of length cap.length at offset
cap.offset at BAR selected by cap.bar and store the first
cap.length bytes in pci_cfg_data.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/virtio/virtio_pci_common.c |  31 ++-
 drivers/virtio/virtio_pci_common.h |  21 +
 drivers/virtio/virtio_pci_legacy.c |   1 +
 drivers/virtio/virtio_pci_modern.c | 183 ++---
 4 files changed, 221 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 4e6132dd0ca3..0030180411cc 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -29,45 +29,68 @@ MODULE_PARM_DESC(force_legacy,
 
 u8 vp_read_isr(const struct virtio_pci_device *dev)
 {
+   if (dev-no_mmio)
+   return dev-no_mmio-read8(dev, dev-isr, 0);
return ioread8(dev-isr);
 }
 
 void vp_write_notify(const struct virtqueue *vq, u16 vqindex)
 {
-   iowrite16(vqindex, (void __iomem *)vq-priv);
+   struct virtio_pci_device *vp_dev = to_vp_device(vq-vdev);
+
+   if (vp_dev-no_mmio)
+   vp_dev-no_mmio-write16(vp_dev, (void __iomem *)vq-priv,
+0, vqindex);
+   else
+   iowrite16(vqindex, (void __iomem *)vq-priv);
 }
 
 u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset)
 {
+   if (dev-no_mmio)
+   return dev-no_mmio-read32(dev, dev-common, offset);
return ioread32((void __iomem *)dev-common + offset);
 }
 
 u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset)
 {
+   if (dev-no_mmio)
+   return dev-no_mmio-read16(dev, dev-common, offset);
return ioread16((void __iomem *)dev-common + offset);
 }
 
 u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset)
 {
+   if (dev-no_mmio)
+   return dev-no_mmio-read8(dev, dev-common, offset);
return ioread8((void __iomem *)dev-common + offset);
 }
 
 void __vp_write_common32(const struct virtio_pci_device *dev,
 size_t offset, u32 val)
 {
-   iowrite32(val, (void __iomem *)dev-common + offset);
+   if (dev-no_mmio)
+   dev-no_mmio-write32(dev, dev-common, offset, val);
+   else
+   iowrite32(val, (void __iomem *)dev-common + offset);
 }
 
 void __vp_write_common16(const struct virtio_pci_device *dev,
 size_t offset, u16 val)
 {
-   iowrite16(val, (void __iomem *)dev-common + offset);
+   if (dev-no_mmio)
+   dev-no_mmio-write16(dev, dev-common, offset, val);
+   else
+   iowrite16(val, (void __iomem *)dev-common + offset);
 }
 
 void __vp_write_common8(const struct virtio_pci_device *dev,
size_t offset, u8 val)
 {
-   iowrite8(val, (void __iomem *)dev-common + offset);
+   if (dev-no_mmio)
+   dev-no_mmio-write8(dev, dev-common, offset, val);
+   else
+   iowrite8(val, (void __iomem *)dev-common + offset);
 }
 
 /* wait for pending irq handlers */
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 15a20c968ae7..d891e3123cdd 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -53,6 +53,9 @@ struct virtio_pci_device {
struct virtio_device vdev;
struct pci_dev *pci_dev;
 
+   /* This is only valid for modern devices. */
+   const struct virtio_pci_no_mmio_ops *no_mmio;
+
/* In legacy mode, these two point to within -legacy. */
/* Where to read and clear interrupt */
u8 __iomem *isr;
@@ -75,6 +78,9 @@ struct virtio_pci_device {
/* Multiply queue_notify_off by this value. (non-legacy mode). */
u32 notify_offset_multiplier;
 
+   /* PCI config window for BAR access (non-legacy mode). */
+   int window;
+
/* Legacy only field */
/* the IO mapping for the PCI config space */
void __iomem *ioaddr;
@@ -113,6 +119,21 @@ struct virtio_pci_device

[PATCH 1/2] virtio_pci: abstract all MMIO accesses.

2015-02-10 Thread Rusty Russell
This is in preparation for testing the virtio pci config mmio backdoor.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/virtio/virtio_pci_common.c |  47 -
 drivers/virtio/virtio_pci_common.h |  41 +++
 drivers/virtio/virtio_pci_modern.c | 140 -
 3 files changed, 177 insertions(+), 51 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index e894eb278d83..4e6132dd0ca3 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -27,6 +27,49 @@ MODULE_PARM_DESC(force_legacy,
 Force legacy mode for transitional virtio 1 devices);
 #endif
 
+u8 vp_read_isr(const struct virtio_pci_device *dev)
+{
+   return ioread8(dev-isr);
+}
+
+void vp_write_notify(const struct virtqueue *vq, u16 vqindex)
+{
+   iowrite16(vqindex, (void __iomem *)vq-priv);
+}
+
+u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset)
+{
+   return ioread32((void __iomem *)dev-common + offset);
+}
+
+u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset)
+{
+   return ioread16((void __iomem *)dev-common + offset);
+}
+
+u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset)
+{
+   return ioread8((void __iomem *)dev-common + offset);
+}
+
+void __vp_write_common32(const struct virtio_pci_device *dev,
+size_t offset, u32 val)
+{
+   iowrite32(val, (void __iomem *)dev-common + offset);
+}
+
+void __vp_write_common16(const struct virtio_pci_device *dev,
+size_t offset, u16 val)
+{
+   iowrite16(val, (void __iomem *)dev-common + offset);
+}
+
+void __vp_write_common8(const struct virtio_pci_device *dev,
+   size_t offset, u8 val)
+{
+   iowrite8(val, (void __iomem *)dev-common + offset);
+}
+
 /* wait for pending irq handlers */
 void vp_synchronize_vectors(struct virtio_device *vdev)
 {
@@ -45,7 +88,7 @@ bool vp_notify(struct virtqueue *vq)
 {
/* we write the queue's selector into the notification register to
 * signal the other end */
-   iowrite16(vq-index, (void __iomem *)vq-priv);
+   vp_write_notify(vq, vq-index);
return true;
 }
 
@@ -89,7 +132,7 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
 
/* reading the ISR has the effect of also clearing it so it's very
 * important to save off the value. */
-   isr = ioread8(vp_dev-isr);
+   isr = vp_read_isr(vp_dev);
 
/* It's definitely not us if the ISR was not high */
if (!isr)
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 28ee4e56badf..15a20c968ae7 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -113,6 +113,47 @@ struct virtio_pci_device {
u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector);
 };
 
+/* Accessor functions. */
+u8 vp_read_isr(const struct virtio_pci_device *dev);
+void vp_write_notify(const struct virtqueue *vq, u16 vqindex);
+u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset);
+u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset);
+u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset);
+void __vp_write_common32(const struct virtio_pci_device *dev,
+size_t offset, u32 val);
+void __vp_write_common16(const struct virtio_pci_device *dev,
+size_t offset, u16 val);
+void __vp_write_common8(const struct virtio_pci_device *dev,
+   size_t offset, u8 val);
+
+#define vp_read_common32(dev, fieldname)   \
+   __vp_read_common32((dev),   \
+  offsetof(struct virtio_pci_common_cfg, fieldname) + \
+  BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 4))
+#define vp_read_common16(dev, fieldname)   \
+   __vp_read_common16((dev),   \
+   offsetof(struct virtio_pci_common_cfg, fieldname) + \
+   BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 2))
+#define vp_read_common8(dev, fieldname)\
+   __vp_read_common8((dev),\
+   offsetof(struct virtio_pci_common_cfg, fieldname) + \
+   BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 1))
+#define vp_write_common32(dev, fieldname, val) \
+   __vp_write_common32((dev),  \
+  offsetof(struct virtio_pci_common_cfg, fieldname) + \
+  BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 4), \
+  (val))
+#define vp_write_common16(dev, fieldname, val) \
+   __vp_write_common16

[RFC 0/2] virtio_pci: patches never to apply.

2015-02-10 Thread Rusty Russell
This should allow testing when QEMU gets VIRTIO_PCI_CAP_PCI_CFG support,
but I'm pretty sure we should never allow these patches upstream.

Tested with lguest (in virtio-next), which supports VIRTIO_PCI_CAP_PCI_CFG.

Rusty Russell (2):
  virtio_pci: abstract all MMIO accesses.
  virtio: Introducing virtio_pci.no_mmio, the worst boot option in
history.

 drivers/virtio/virtio_pci_common.c |  70 +++-
 drivers/virtio/virtio_pci_common.h |  62 
 drivers/virtio/virtio_pci_legacy.c |   1 +
 drivers/virtio/virtio_pci_modern.c | 317 ++---
 4 files changed, 391 insertions(+), 59 deletions(-)

-- 
2.1.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending

2015-02-10 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Feb 10, 2015 at 11:33:52AM +1030, Rusty Russell wrote:
 Jason Wang jasow...@redhat.com writes:
  We currently does:
 
  bufs = (avail-idx - last_used_idx) * 3 / 4;
 
  This is ok now since we only try to enable the delayed callbacks when
  the queue is about to be full. This may not work well when there is
  only one pending buffer in the virtqueue (this may be the case after
  tx interrupt was enabled). Since virtqueue_enable_cb() will return
  false which may cause unnecessary triggering of napis. This patch
  correct this by only calculate the four thirds when bufs is not one.
 
 I mildly prefer to avoid the branch, by changing the calculation like
 so:
 
 /* Set bufs = 1, even if there's only one pending buffer */
 bufs = (bufs + 1) * 3 / 4;

 Or bus * 3/4 + 1

 But it's not clear to me how much this happens.  I'm happy with the
 patch though, as currently virtqueue_enable_cb_delayed() is the same
 as virtqueue_enable_cb() if there's only been one buffer added.
 
 Cheers,
 Rusty.

 But isn't this by design?
 The documentation says:

  * This re-enables callbacks but hints to the other side to delay
  * interrupts until most of the available buffers have been processed;

 Clearly, this implies that when there's one buffer it must behave
 exactly the same.

Yes, my mistake.  We could hit the never gets notified case with this
change, and that's a much bigger problem.

So I don't think we can accept this patch...
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()

2015-02-09 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Currently, we do nothing to prevent the callbacks in
 virtqueue_disable_cb() when event index is used. This may cause
 spurious interrupts which may damage the performance. This patch tries
 to publish avail event as the used even to prevent the callbacks.

 Signed-off-by: Jason Wang jasow...@redhat.com

Acked-by: Rusty Russell ru...@rustcorp.com.au

 ---
  drivers/virtio/virtio_ring.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 545fed5..e9ffbfb 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
   struct vring_virtqueue *vq = to_vvq(_vq);
  
   vq-vring.avail-flags |= cpu_to_virtio16(_vq-vdev, 
 VRING_AVAIL_F_NO_INTERRUPT);
 + vring_used_event(vq-vring) = cpu_to_virtio16(_vq-vdev,
 +vq-vring.avail-idx);
  }
  EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
  
 -- 
 1.8.3.1

 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support

2015-02-09 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 This patch enables the interrupt coalescing setting through ethtool.

The problem is that there's nothing network specific about interrupt
coalescing.  I can see other devices wanting exactly the same thing,
which means we'd deprecate this in the next virtio standard.

I think the right answer is to extend like we did with
vring_used_event(), eg:

1) Add a new feature VIRTIO_F_RING_COALESCE.
2) Add another a 32-bit field after vring_used_event(), eg:
#define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 
2]))

This loses the ability to coalesce by number of frames, but we can still
do number of sg entries, as we do now with used_event, and we could
change virtqueue_enable_cb_delayed() to take a precise number if we
wanted.

My feeling is that this should be a v1.0-only feature though
(eg. feature bit 33).

Cheers,
Rusty.

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/net/virtio_net.c| 67 
 +
  include/uapi/linux/virtio_net.h | 12 
  2 files changed, 79 insertions(+)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index cc5f5de..2b958fb 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -145,6 +145,11 @@ struct virtnet_info {
  
   /* Budget for polling tx completion */
   u32 tx_work_limit;
 +
 + __u32 rx_coalesce_usecs;
 + __u32 rx_max_coalesced_frames;
 + __u32 tx_coalesce_usecs;
 + __u32 tx_max_coalesced_frames;
  };
  
  struct padded_vnet_hdr {
 @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device 
 *dev,
   channels-other_count = 0;
  }
  
 +static int virtnet_set_coalesce(struct net_device *dev,
 + struct ethtool_coalesce *ec)
 +{
 + struct virtnet_info *vi = netdev_priv(dev);
 + struct scatterlist sg;
 + struct virtio_net_ctrl_coalesce c;
 +
 + if (!vi-has_cvq ||
 + !virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_COALESCE))
 + return -EOPNOTSUPP;
 + if (vi-rx_coalesce_usecs != ec-rx_coalesce_usecs ||
 + vi-rx_max_coalesced_frames != ec-rx_max_coalesced_frames) {
 + c.coalesce_usecs = ec-rx_coalesce_usecs;
 + c.max_coalesced_frames = ec-rx_max_coalesced_frames;
 + sg_init_one(sg, c, sizeof(c));
 + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
 +   VIRTIO_NET_CTRL_COALESCE_RX_SET,
 +   sg)) {
 + dev_warn(dev-dev, Fail to set rx coalescing\n);
 + return -EINVAL;
 + }
 + vi-rx_coalesce_usecs = ec-rx_coalesce_usecs;
 + vi-rx_max_coalesced_frames = ec-rx_max_coalesced_frames;
 + }
 +
 + if (vi-tx_coalesce_usecs != ec-tx_coalesce_usecs ||
 + vi-tx_max_coalesced_frames != ec-tx_max_coalesced_frames) {
 + c.coalesce_usecs = ec-tx_coalesce_usecs;
 + c.max_coalesced_frames = ec-tx_max_coalesced_frames;
 + sg_init_one(sg, c, sizeof(c));
 + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE,
 +   VIRTIO_NET_CTRL_COALESCE_TX_SET,
 +   sg)) {
 + dev_warn(dev-dev, Fail to set tx coalescing\n);
 + return -EINVAL;
 + }
 + vi-tx_coalesce_usecs = ec-tx_coalesce_usecs;
 + vi-tx_max_coalesced_frames = ec-tx_max_coalesced_frames;
 + }
 +
 + vi-tx_work_limit = ec-tx_max_coalesced_frames_irq;
 +
 + return 0;
 +}
 +
 +static int virtnet_get_coalesce(struct net_device *dev,
 + struct ethtool_coalesce *ec)
 +{
 + struct virtnet_info *vi = netdev_priv(dev);
 +
 + ec-rx_coalesce_usecs = vi-rx_coalesce_usecs;
 + ec-rx_max_coalesced_frames = vi-rx_max_coalesced_frames;
 + ec-tx_coalesce_usecs = vi-tx_coalesce_usecs;
 + ec-tx_max_coalesced_frames = vi-tx_max_coalesced_frames;
 + ec-tx_max_coalesced_frames_irq = vi-tx_work_limit;
 +
 + return 0;
 +}
 +
  static const struct ethtool_ops virtnet_ethtool_ops = {
   .get_drvinfo = virtnet_get_drvinfo,
   .get_link = ethtool_op_get_link,
   .get_ringparam = virtnet_get_ringparam,
   .set_channels = virtnet_set_channels,
   .get_channels = virtnet_get_channels,
 + .set_coalesce = virtnet_set_coalesce,
 + .get_coalesce = virtnet_get_coalesce,
  };
  
  #define MIN_MTU 68
 @@ -2048,6 +2114,7 @@ static unsigned int features[] = {
   VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
   VIRTIO_NET_F_CTRL_MAC_ADDR,
   VIRTIO_F_ANY_LAYOUT,
 + VIRTIO_NET_F_CTRL_COALESCE,
  };
  
  static struct virtio_driver virtio_net_driver = {
 diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux

Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending

2015-02-09 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 We currently does:

 bufs = (avail-idx - last_used_idx) * 3 / 4;

 This is ok now since we only try to enable the delayed callbacks when
 the queue is about to be full. This may not work well when there is
 only one pending buffer in the virtqueue (this may be the case after
 tx interrupt was enabled). Since virtqueue_enable_cb() will return
 false which may cause unnecessary triggering of napis. This patch
 correct this by only calculate the four thirds when bufs is not one.

I mildly prefer to avoid the branch, by changing the calculation like
so:

/* Set bufs = 1, even if there's only one pending buffer */
bufs = (bufs + 1) * 3 / 4;

But it's not clear to me how much this happens.  I'm happy with the
patch though, as currently virtqueue_enable_cb_delayed() is the same
as virtqueue_enable_cb() if there's only been one buffer added.

Cheers,
Rusty.

 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/virtio/virtio_ring.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 00ec6b3..545fed5 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
* entry. Always do both to keep code simple. */
   vq-vring.avail-flags = cpu_to_virtio16(_vq-vdev, 
 ~VRING_AVAIL_F_NO_INTERRUPT);
   /* TODO: tune this threshold */
 - bufs = (u16)(virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) - 
 vq-last_used_idx) * 3 / 4;
 + bufs = (u16)(virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) -
 +  vq-last_used_idx);
 + if (bufs != 1)
 + bufs = bufs * 3 / 4;
   vring_used_event(vq-vring) = cpu_to_virtio16(_vq-vdev, 
 vq-last_used_idx + bufs);
   virtio_mb(vq-weak_barriers);
   if (unlikely((u16)(virtio16_to_cpu(_vq-vdev, vq-vring.used-idx) - 
 vq-last_used_idx)  bufs)) {
 -- 
 1.8.3.1

 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio_pci: use 16-bit accessor for queue_enable.

2015-02-09 Thread Rusty Russell
Since PCI is little endian, 8-bit access might work, but the spec section
is very clear on this:

  4.1.3.1 Driver Requirements: PCI Device Layout

  The driver MUST access each field using the “natural” access method,
  i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
  fields and 8-bit accesses for 8-bit fields.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index f16e462cb4ef..2aa38e59db2e 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -294,7 +294,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
 
/* Check if queue is either not available or already active. */
num = ioread16(cfg-queue_size);
-   if (!num || ioread8(cfg-queue_enable))
+   if (!num || ioread16(cfg-queue_enable))
return ERR_PTR(-ENOENT);
 
if (num  (num - 1)) {
@@ -394,7 +394,7 @@ static int vp_modern_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
 */
list_for_each_entry(vq, vdev-vqs, list) {
iowrite16(vq-index, vp_dev-common-queue_select);
-   iowrite8(1, vp_dev-common-queue_enable);
+   iowrite16(1, vp_dev-common-queue_enable);
}
 
return 0;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled.

2015-02-08 Thread Rusty Russell
Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp writes:

From 11fd997d724f520ca628615e7ffbfd7901c40b62 Mon Sep 17 00:00:00 2001
 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 Date: Fri, 6 Feb 2015 13:28:38 +0900
 Subject: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled.

 The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is
 called if vq-num_added == (1  16) - 1 before calling END_USE().
 The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE()
 upon entry, and will call panic() if DEBUG is enabled.
 Move this virtqueue_kick() call to after END_USE() call.

Thanks, applied.

Cheers,
Rusty.


 Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
 ---
  drivers/virtio/virtio_ring.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 00ec6b3..596735b 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -245,14 +245,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
   vq-vring.avail-idx = cpu_to_virtio16(_vq-vdev, 
 virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) + 1);
   vq-num_added++;
  
 + pr_debug(Added buffer head %i to %p\n, head, vq);
 + END_USE(vq);
 +
   /* This is very unlikely, but theoretically possible.  Kick
* just in case. */
   if (unlikely(vq-num_added == (1  16) - 1))
   virtqueue_kick(_vq);
  
 - pr_debug(Added buffer head %i to %p\n, head, vq);
 - END_USE(vq);
 -
   return 0;
  }
  
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mmio: Update the device to OASIS spec version

2015-01-21 Thread Rusty Russell
Pawel Moll pawel.m...@arm.com writes:
 This patch add a support for second version of the virtio-mmio device,
 which follows OASIS Virtual I/O Device (VIRTIO) Version 1.0
 specification.

Nice job, that turned out quite neat!

Applied,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio-mmio: Update the device to OASIS spec version

2015-01-21 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Jan 20, 2015 at 06:12:11PM +, Pawel Moll wrote:
 This patch add a support for second version of the virtio-mmio device,
 which follows OASIS Virtual I/O Device (VIRTIO) Version 1.0
 specification.

OK, applied this instead :)

I'll leave it in my pending queue for a few days in case there's a v3.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio for 3.20

2015-01-21 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Mon, Jan 19, 2015 at 12:56:28PM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  Hi Rusty, all
 
  I parked outstanding virtio patches here:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next
  this way they are merged in linux-next as well.
 
  Rusty if you'll be doing 3.20 window, would you mind
  checking that tree to verify nothing's lost?
 
 Hi Michael,
 
 Thanks, now I've returned from linux.conf.au, I'm starting to
 organize the 3.20 queue.  I've merged this branch into my virtio-next
 branch.
 
 Cheers,
 Rusty.

 Thanks, good to know.
 I tweaked some patches since due to test failure reports by Gerd.
 Updated patches are tagged post-squash, if you want to see what
 changed, see fixup patches in the series tagged pre-squash.

 Or you can look at my tree:
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
 virtio-next
 is pre-squash.
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next
 is post-squash.

OK, I merged your vhost-next with my virtio-next, and the merge was
empty.

So we're up-to-date.  Please send patches against my virtio-next from
now on.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 10/16] s390: add pci_iomap_range

2015-01-20 Thread Rusty Russell
Sebastian Ott seb...@linux.vnet.ibm.com writes:
 On Wed, 14 Jan 2015, Michael S. Tsirkin wrote:
  }
 -EXPORT_SYMBOL_GPL(pci_iomap);
 +EXPORT_SYMBOL_GPL(pci_iomap_range);
 +
 +void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 +{
 +return pci_iomap_range(dev, bar, 0, maxlen);
 +}
 +EXPORT_SYMBOL(pci_iomap);

 As discussed earlier, could you please leave that as EXPORT_SYMBOL_GPL.
 If there's a reason to have these interfaces as EXPORT_SYMBOL, we could
 change all of them in an extra patch.

 With this change integrated you can add
 Acked-by: Sebastian Ott seb...@linux.vnet.ibm.com

OK, I've gone back and rebased virtio-next to fix this (I don't really
want to think too hard about the legal consequences of having a version
which isn't EXPORT_SYMBOL_GPL).

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] virtio: graceful failure with get == NULL

2015-01-19 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Jan 13, 2015 at 04:43:07PM +0200, Michael S. Tsirkin wrote:
 virtio 1.0 says device configuration is optional, but most drivers treat it 
 as
 mandatory.  Even if presented by device, guest bios might disable the BAR
 holding that configuration, so we can't assume it's there, but we also don't
 want to fail if not in case drivers can cope with it's absence - such as caif
 or rng.
 
 Add code to drivers to check presence of get callback and fail gracefully.

 Rusty, in case it's not clear: I'd like to hear
 your opinion on these patches, since virtio pci modern
 driver I'm now preparing for submission, depends on this.

Just to note for anyone following, that these are now in my virtio-next
tree.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio for 3.20

2015-01-18 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Hi Rusty, all

 I parked outstanding virtio patches here:
 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next
 this way they are merged in linux-next as well.

 Rusty if you'll be doing 3.20 window, would you mind
 checking that tree to verify nothing's lost?

Hi Michael,

Thanks, now I've returned from linux.conf.au, I'm starting to
organize the 3.20 queue.  I've merged this branch into my virtio-next
branch.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_pci: add module param to force legacy mode

2015-01-18 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 If set, try legacy interface first, modern one if that fails.  Useful to
 work around device/driver bugs, and for compatibility testing.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Thanks, I've applied all these to virtio-next.

Cheers!
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 0/3] Sharing MSIX irq for tx/rx queue pairs

2015-01-04 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Hi all:

 This series try to share MSIX irq for each tx/rx queue pair. This is
 done through:

 - introducing virtio pci channel which are group of virtqueues that
   sharing a single MSIX irq (Patch 1)
 - expose channel setting to virtio core api (Patch 2)
 - try to use channel setting in virtio-net (Patch 3)

Hi Jason,

Is channel a term you created yourself, or something I was
just unaware of?  irq_group would seem more obvious, if the former.

 For the transport that does not support channel, channel paramters
 were simply ignored. For devices that does not use channel, it can
 simply pass NULL or zero to virito core.

 With the patch, 1 MSIX irq were saved for each TX/RX queue pair.

It seems fairly straightforward.

Acked-by: Rusty Russell ru...@rustcorp.com.au

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 6/6] virtio: core support for config generation

2014-12-21 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio 1.0 spec says:

 Drivers MUST NOT assume reads from fields greater than 32 bits wide are
 atomic, nor are reads from multiple fields: drivers SHOULD read device
 configuration space fields like so:
   u32 before, after;
   do {
   before = get_config_generation(device);
   // read config entry/entries.
   after = get_config_generation(device);
   } while (after != before);

 Do exactly this, for transports that support it.

  static inline void virtio_cwrite8(struct virtio_device *vdev,
 @@ -352,6 +375,7 @@ static inline u64 virtio_cread64(struct virtio_device 
 *vdev,
  {
   u64 ret;
   vdev-config-get(vdev, offset, ret, sizeof(ret));
 + __virtio_cread_many(vdev, offset, ret, 1, sizeof(ret));
   return virtio64_to_cpu(vdev, (__force __virtio64)ret);
  }

The vdev-config-get(vdev, offset, ret, sizeof(ret)); should
be deleted.  Harmless if not, though.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] fix up vringh/mic sparse errors

2014-12-18 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 This fixes remaining sparse warnings in vringh and mic by using
 virtio 1.0 compliant wrappers.

 This also needs by get_user patches to avoid getting warnings
 from these calls.

 Tested by running vringh_test.

 Rusty, I prefer fixing all these warnings for 3.19, any objections?

OK, I've queued these too.

Cheers,
Rusty.

 Michael S. Tsirkin (3):
   vringh: 64 bit features
   vringh: initial virtio 1.0 support
   mic/host: initial virtio 1.0 support

  include/linux/vringh.h  |  37 ++-
  drivers/misc/mic/host/mic_debugfs.c |  18 --
  drivers/vhost/vringh.c  | 125 
 ++--
  3 files changed, 123 insertions(+), 57 deletions(-)

 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/6] tools/virtio: more stubs

2014-12-18 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 As usual, add more stubs to fix test build after main
 codebase changes.

This doesn't apply, since:

 diff --git a/tools/virtio/linux/virtio_config.h 
 b/tools/virtio/linux/virtio_config.h
 index dafe1c9..806d683 100644
 --- a/tools/virtio/linux/virtio_config.h
 +++ b/tools/virtio/linux/virtio_config.h
 @@ -1,8 +1,7 @@
  #include linux/virtio_byteorder.h
  #include linux/virtio.h
 +#include uapi/linux/virtio_config.h
  
 -#define VIRTIO_TRANSPORT_F_START 28
 -#define VIRTIO_TRANSPORT_F_END   32
  /*

Version in master has no headers in this file.

What's this based on?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] virtio 1.0 fixups, tweaks

2014-12-18 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Fixes a couple of minor compliance issues in new virtio 1.0 code.
 Plus, adds a couple of minor cleanups - not bugfixes,
 but seem safe enough for 3.19.

OK, I've applied these.

Thanks,
Rusty.

 Michael S. Tsirkin (6):
   virtio: set VIRTIO_CONFIG_S_FEATURES_OK on restore
   virtio_config: fix virtio_cread_bytes
   virtio_pci_common.h: drop VIRTIO_PCI_NO_LEGACY
   virtio_pci: move probe to common file
   virtio_pci: add VIRTIO_PCI_NO_LEGACY
   virtio: core support for config generation

  drivers/virtio/virtio_pci_common.h |  7 +++
  include/linux/virtio_config.h  | 29 -
  include/uapi/linux/virtio_pci.h| 15 ++-
  drivers/virtio/virtio.c| 37 +++--
  drivers/virtio/virtio_pci_common.c | 34 +-
  drivers/virtio/virtio_pci_legacy.c | 24 ++--
  6 files changed, 99 insertions(+), 47 deletions(-)

 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: BUG_ON in virtio-ring.c

2014-12-17 Thread Rusty Russell
Alexey Lapitsky lex.pub...@gmail.com writes:
 Hi,

 Sorry for the long delay. It prints exactly the same:

 [3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
 [3.802191]  vda: vda1 vda2  vda5 

 A little bit more about my setup (if it helps):

OK, I think this is fixed by Ming Lei's recent count updates,
which interestingly did not seem to be CC:stable.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/9] virtio_pci: update file descriptions and copyright

2014-12-15 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 There's been a lot of changes since 2007.
 List main authors, add Red Hat copyright.

Acked-by: Rusty Russell ru...@rustcorp.com.au

Cheers,
Rusty.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/virtio/virtio_pci.h| 5 -
  drivers/virtio/virtio_pci.c| 5 -
  drivers/virtio/virtio_pci_legacy.c | 5 -
  3 files changed, 12 insertions(+), 3 deletions(-)

 diff --git a/drivers/virtio/virtio_pci.h b/drivers/virtio/virtio_pci.h
 index a3b1259..fba383c 100644
 --- a/drivers/virtio/virtio_pci.h
 +++ b/drivers/virtio/virtio_pci.h
 @@ -1,15 +1,18 @@
  #ifndef _DRIVERS_VIRTIO_VIRTIO_PCI_H
  #define _DRIVERS_VIRTIO_VIRTIO_PCI_H
  /*
 - * Virtio PCI driver
 + * Virtio PCI driver - APIs for common functionality for all device versions
   *
   * This module allows virtio devices to be used over a virtual PCI device.
   * This can be used with QEMU based VMMs like KVM or Xen.
   *
   * Copyright IBM Corp. 2007
 + * Copyright Red Hat, Inc. 2014
   *
   * Authors:
   *  Anthony Liguori  aligu...@us.ibm.com
 + *  Rusty Russell ru...@rustcorp.com.au
 + *  Michael S. Tsirkin m...@redhat.com
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index 6c7e186..c263f67 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -1,13 +1,16 @@
  /*
 - * Virtio PCI driver
 + * Virtio PCI driver - common functionality for all device versions
   *
   * This module allows virtio devices to be used over a virtual PCI device.
   * This can be used with QEMU based VMMs like KVM or Xen.
   *
   * Copyright IBM Corp. 2007
 + * Copyright Red Hat, Inc. 2014
   *
   * Authors:
   *  Anthony Liguori  aligu...@us.ibm.com
 + *  Rusty Russell ru...@rustcorp.com.au
 + *  Michael S. Tsirkin m...@redhat.com
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
 diff --git a/drivers/virtio/virtio_pci_legacy.c 
 b/drivers/virtio/virtio_pci_legacy.c
 index 9fdec9a..1ca4422 100644
 --- a/drivers/virtio/virtio_pci_legacy.c
 +++ b/drivers/virtio/virtio_pci_legacy.c
 @@ -1,13 +1,16 @@
  /*
 - * Virtio PCI driver
 + * Virtio PCI driver - legacy device support
   *
   * This module allows virtio devices to be used over a virtual PCI device.
   * This can be used with QEMU based VMMs like KVM or Xen.
   *
   * Copyright IBM Corp. 2007
 + * Copyright Red Hat, Inc. 2014
   *
   * Authors:
   *  Anthony Liguori  aligu...@us.ibm.com
 + *  Rusty Russell ru...@rustcorp.com.au
 + *  Michael S. Tsirkin m...@redhat.com
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] virtio_ccw: minor enhancements

2014-12-10 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Hi Rusty, I guess you are busy.

 I'd like to send all these patch series combined to Linus.  To see what
 I have queued, you can check the vhost-next branch in my tree.

 Will wait until tomorrow to give you a chance to respond.

Hi Michael,

Indeed, this is the last week of my sabbatical.  I am back
full time next week.  See my Ack on your patches.

Thanks.
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] virito: introduce methods of fixing device features

2014-11-25 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Buggy host may advertised buggy host features (a usual case is that host
 advertise a feature whose dependencies were missed). In this case, driver
 should detect and disable the buggy features by itself.

Sorry, I've been focussing elsewhere.

I would really prefer that drivers offer a feature_depends table,
which can indicate that feature A depends on feature B, and have the
core iterate, complain and fixup as necessary.

Is that expressive enough, or do we need more?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.

2014-11-11 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:
 From: Rusty Russell ru...@rustcorp.com.au

 current_rng holds one reference, and we bump it every time we want
 to do a read from it.

 This means we only hold the rng_mutex to grab or drop a reference,
 so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
 block on read of /dev/hwrng.

 Using a kref is overkill (we're always under the rng_mutex), but
 a standard pattern.

 This also solves the problem that the hwrng_fillfn thread was
 accessing current_rng without a lock, which could change (eg. to NULL)
 underneath it.

 v4: decrease last reference for triggering the cleanup

This doesn't make any sense:

 +static void drop_current_rng(void)
 +{
 + struct hwrng *rng = current_rng;
 +
 + BUG_ON(!mutex_is_locked(rng_mutex));
 + if (!current_rng)
 + return;
 +
 + /* release current_rng reference */
 + kref_put(current_rng-ref, cleanup_rng);
 + current_rng = NULL;
 +
 + /* decrease last reference for triggering the cleanup */
 + kref_put(rng-ref, cleanup_rng);
 +}

Why would it drop the refcount twice?  This doesn't make sense.

Hmm, because you added kref_init, which initializes the reference count
to 1, you created this bug.

Leave out the kref_init, and let it naturally be 0 (until, and if, it
becomes current_rng).  Add a comment if you want.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 4/6] hw_random: fix unregister race.

2014-11-11 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:
 From: Rusty Russell ru...@rustcorp.com.au

 The previous patch added one potential problem: we can still be
 reading from a hwrng when it's unregistered.  Add a wait for zero
 in the hwrng_unregister path.

 v4: add cleanup_done flag to insure that cleanup is done

That's a bit weird.  The usual pattern would be to hold a reference
until we're actually finished, but this reference is a bit weird.

We hold the mutex across cleanup, so we could grab that but we have to
take care sleeping inside wait_event, otherwise Peter will have to fix
my code again :)

AFAICT the wake_woken() stuff isn't merged yet, so your patch will
have to do for now.

 @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref)
  
   if (rng-cleanup)
   rng-cleanup(rng);
 + rng-cleanup_done = true;
 + wake_up_all(rng_done);
  }
  
  static void set_current_rng(struct hwrng *rng)
 @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng)
   kthread_stop(hwrng_fill);
   } else
   mutex_unlock(rng_mutex);
 +
 + /* Just in case rng is reading right now, wait. */
 + wait_event(rng_done, rng-cleanup_done 
 +atomic_read(rng-ref.refcount) == 0);
 +

The atomic_read() isn't necessary here.

However, you should probably init cleanup_done in hwrng_register().
(Probably noone does unregister then register, but let's be clear).

Thanks,
Rusty.

  }
  EXPORT_SYMBOL_GPL(hwrng_unregister);
  
 diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
 index c212e71..7832e50 100644
 --- a/include/linux/hw_random.h
 +++ b/include/linux/hw_random.h
 @@ -46,6 +46,7 @@ struct hwrng {
   /* internal. */
   struct list_head list;
   struct kref ref;
 + bool cleanup_done;
  };
  
  /** Register a new Hardware Random Number Generator driver. */
 -- 
 1.9.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: kernel BUG at drivers/block/virtio_blk.c:172!

2014-11-10 Thread Rusty Russell
Jeff Layton jlay...@poochiereds.net writes:

 In the latest Fedora rawhide kernel in the repos, I'm seeing the
 following oops when mounting xfs. rc2-ish kernels seem to be fine:

 [   64.669633] [ cut here ]
 [   64.670008] kernel BUG at drivers/block/virtio_blk.c:172!

Hmm, that's:

BUG_ON(req-nr_phys_segments + 2  vblk-sg_elems);

But during our probe routine we said:

/* We can handle whatever the host told us to handle. */
blk_queue_max_segments(q, vblk-sg_elems-2);

Jens?

Thanks,
Rusty.

 [   64.670008] invalid opcode:  [#1] SMP 
 [   64.670008] Modules linked in: xfs libcrc32c snd_hda_codec_generic 
 snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq 
 snd_seq_device snd_pcm ppdev snd_timer snd virtio_net virtio_balloon 
 soundcore serio_raw parport_pc virtio_console pvpanic parport i2c_piix4 nfsd 
 auth_rpcgss nfs_acl lockd grace sunrpc qxl virtio_blk drm_kms_helper ttm drm 
 ata_generic virtio_pci virtio_ring virtio pata_acpi
 [   64.670008] CPU: 1 PID: 705 Comm: mount Not tainted 
 3.18.0-0.rc3.git2.1.fc22.x86_64 #1
 [   64.670008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 [   64.670008] task: 8800d94a4ec0 ti: 8800d9f38000 task.ti: 
 8800d9f38000
 [   64.670008] RIP: 0010:[a00287c0]  [a00287c0] 
 virtio_queue_rq+0x290/0x2a0 [virtio_blk]
 [   64.670008] RSP: 0018:8800d9f3b778  EFLAGS: 00010202
 [   64.670008] RAX: 0082 RBX: 8800d8375700 RCX: 
 dead00200200
 [   64.670008] RDX: 0001 RSI: 8800d8375700 RDI: 
 8800d82c4c00
 [   64.670008] RBP: 8800d9f3b7b8 R08: 8800d8375700 R09: 
 0001
 [   64.670008] R10: 0001 R11: 0004 R12: 
 8800d9f3b7e0
 [   64.670008] R13: 8800d82c4c00 R14: 880118629200 R15: 
 
 [   64.670008] FS:  7f5c64dfd840() GS:88011b00() 
 knlGS:
 [   64.670008] CS:  0010 DS:  ES:  CR0: 8005003b
 [   64.670008] CR2: 7fffe6458fb8 CR3: d06d3000 CR4: 
 06e0
 [   64.670008] Stack:
 [   64.670008]  8801 8800d8375870 0001 
 8800d82c4c00
 [   64.670008]  8800d9f3b7e0  8800d8375700 
 8800d82c4c48
 [   64.670008]  8800d9f3b828 813ec258 8800d82c8000 
 0001
 [   64.670008] Call Trace:
 [   64.670008]  [813ec258] __blk_mq_run_hw_queue+0x1c8/0x330
 [   64.670008]  [813ecd80] blk_mq_run_hw_queue+0x70/0x90
 [   64.670008]  [813ee0cd] blk_sq_make_request+0x24d/0x5c0
 [   64.670008]  [813dec68] generic_make_request+0xf8/0x150
 [   64.670008]  [813ded38] submit_bio+0x78/0x190
 [   64.670008]  [a02fc27e] _xfs_buf_ioapply+0x2be/0x5f0 [xfs]
 [   64.670008]  [a0333628] ? xlog_bread_noalign+0xa8/0xe0 [xfs]
 [   64.670008]  [a02ffe21] xfs_buf_submit_wait+0x91/0x840 [xfs]
 [   64.670008]  [a0333628] xlog_bread_noalign+0xa8/0xe0 [xfs]
 [   64.670008]  [a0333ea7] xlog_bread+0x27/0x60 [xfs]
 [   64.670008]  [a03357f3] xlog_find_verify_cycle+0xf3/0x1b0 [xfs]
 [   64.670008]  [a0335de5] xlog_find_head+0x2f5/0x3e0 [xfs]
 [   64.670008]  [a0335f0c] xlog_find_tail+0x3c/0x410 [xfs]
 [   64.670008]  [a033b12d] xlog_recover+0x2d/0x120 [xfs]
 [   64.670008]  [a033cfdb] ? xfs_trans_ail_init+0xcb/0x100 [xfs]
 [   64.670008]  [a0329c3d] xfs_log_mount+0xdd/0x2c0 [xfs]
 [   64.670008]  [a031f744] xfs_mountfs+0x514/0x9c0 [xfs]
 [   64.670008]  [a0320c8d] ? xfs_mru_cache_create+0x18d/0x1f0 [xfs]
 [   64.670008]  [a0322ed0] xfs_fs_fill_super+0x330/0x3b0 [xfs]
 [   64.670008]  [8126d4ac] mount_bdev+0x1bc/0x1f0
 [   64.670008]  [a0322ba0] ? xfs_parseargs+0xbe0/0xbe0 [xfs]
 [   64.670008]  [a0320fd5] xfs_fs_mount+0x15/0x20 [xfs]
 [   64.670008]  [8126de58] mount_fs+0x38/0x1c0
 [   64.670008]  [81202c15] ? __alloc_percpu+0x15/0x20
 [   64.670008]  [812908f8] vfs_kern_mount+0x68/0x160
 [   64.670008]  [81293d6c] do_mount+0x22c/0xc20
 [   64.670008]  [8120d92e] ? might_fault+0x5e/0xc0
 [   64.670008]  [811fcf1b] ? memdup_user+0x4b/0x90
 [   64.670008]  [81294a8e] SyS_mount+0x9e/0x100
 [   64.670008]  [8185e169] system_call_fastpath+0x12/0x17
 [   64.670008] Code: 00 00 c7 86 78 01 00 00 02 00 00 00 48 c7 86 80 01 00 00 
 00 00 00 00 89 86 7c 01 00 00 e9 02 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 
 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
 [   64.670008] RIP  [a00287c0] virtio_queue_rq+0x290/0x2a0 
 [virtio_blk]
 [   64.670008]  RSP 8800d9f3b778
 [   64.715347] ---[ end trace c0ff4a0f2fb21f7f ]---

 It's reliably reproducible and I don't see this oops when I convert the
 same block device to ext4 and mount it. In this setup, the KVM guest
 has a virtio block device that has a LVM2 PV on it with an 

Re: [PATCH v2 4/6] hw_random: fix unregister race.

2014-10-30 Thread Rusty Russell
Herbert Xu herb...@gondor.apana.org.au writes:
 On Thu, Sep 18, 2014 at 08:37:45PM +0800, Amos Kong wrote:
 From: Rusty Russell ru...@rustcorp.com.au
 
 The previous patch added one potential problem: we can still be
 reading from a hwrng when it's unregistered.  Add a wait for zero
 in the hwrng_unregister path.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

 You totally corrupted Rusty's patch.  If you're going to repost
 his series you better make sure that you've got the right patches.

 Just as well though as it made me think a little more about this
 patch :)

OK Amos, can you please repost the complete series?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 1/4] virtio_net: pass vi around

2014-10-27 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Too many places poke at [rs]q-vq-vdev-priv just to get
 the the vi structure.  Let's just pass the pointer around: seems
 cleaner, and might even be faster.

Agreed, it's neater.

Acked-by: Rusty Russell ru...@rustcorp.com.au

Thanks,
Rusty.


 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/net/virtio_net.c | 36 +++-
  1 file changed, 19 insertions(+), 17 deletions(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 57cbc7d..36f3dfc 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -241,11 +241,11 @@ static unsigned long mergeable_buf_to_ctx(void *buf, 
 unsigned int truesize)
  }
  
  /* Called from bottom half context */
 -static struct sk_buff *page_to_skb(struct receive_queue *rq,
 +static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 +struct receive_queue *rq,
  struct page *page, unsigned int offset,
  unsigned int len, unsigned int truesize)
  {
 - struct virtnet_info *vi = rq-vq-vdev-priv;
   struct sk_buff *skb;
   struct skb_vnet_hdr *hdr;
   unsigned int copy, hdr_len, hdr_padded_len;
 @@ -328,12 +328,13 @@ static struct sk_buff *receive_small(void *buf, 
 unsigned int len)
  }
  
  static struct sk_buff *receive_big(struct net_device *dev,
 +struct virtnet_info *vi,
  struct receive_queue *rq,
  void *buf,
  unsigned int len)
  {
   struct page *page = buf;
 - struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
 + struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
  
   if (unlikely(!skb))
   goto err;
 @@ -359,7 +360,8 @@ static struct sk_buff *receive_mergeable(struct 
 net_device *dev,
   int offset = buf - page_address(page);
   unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
  
 - struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, truesize);
 + struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
 +truesize);
   struct sk_buff *curr_skb = head_skb;
  
   if (unlikely(!curr_skb))
 @@ -433,9 +435,9 @@ err_buf:
   return NULL;
  }
  
 -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int 
 len)
 +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 + void *buf, unsigned int len)
  {
 - struct virtnet_info *vi = rq-vq-vdev-priv;
   struct net_device *dev = vi-dev;
   struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
   struct sk_buff *skb;
 @@ -459,9 +461,9 @@ static void receive_buf(struct receive_queue *rq, void 
 *buf, unsigned int len)
   if (vi-mergeable_rx_bufs)
   skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
   else if (vi-big_packets)
 - skb = receive_big(dev, rq, buf, len);
 + skb = receive_big(dev, vi, rq, buf, len);
   else
 - skb = receive_small(buf, len);
 + skb = receive_small(vi, buf, len);
  
   if (unlikely(!skb))
   return;
 @@ -530,9 +532,9 @@ frame_err:
   dev_kfree_skb(skb);
  }
  
 -static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp)
 +static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue 
 *rq,
 +  gfp_t gfp)
  {
 - struct virtnet_info *vi = rq-vq-vdev-priv;
   struct sk_buff *skb;
   struct skb_vnet_hdr *hdr;
   int err;
 @@ -655,9 +657,9 @@ static int add_recvbuf_mergeable(struct receive_queue 
 *rq, gfp_t gfp)
   * before we're receiving packets, or from refill_work which is
   * careful to disable receiving (using napi_disable).
   */
 -static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp)
 +static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 +   gfp_t gfp)
  {
 - struct virtnet_info *vi = rq-vq-vdev-priv;
   int err;
   bool oom;
  
 @@ -668,7 +670,7 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t 
 gfp)
   else if (vi-big_packets)
   err = add_recvbuf_big(rq, gfp);
   else
 - err = add_recvbuf_small(rq, gfp);
 + err = add_recvbuf_small(vi, rq, gfp);
  
   oom = err == -ENOMEM;
   if (err)
 @@ -717,7 +719,7 @@ static void refill_work(struct work_struct *work)
   struct receive_queue *rq = vi-rq[i];
  
   napi_disable(rq-napi);
 - still_empty = !try_fill_recv(rq, GFP_KERNEL);
 + still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
   virtnet_napi_enable(rq);
  
   /* In theory, this can happen

Re: [PATCH] virtio_blk: fix race at module removal

2014-10-27 Thread Rusty Russell
Ming Lei tom.leim...@gmail.com writes:
 On Fri, Oct 24, 2014 at 12:12 AM, Michael S. Tsirkin m...@redhat.com wrote:
 If a device appears while module is being removed,
 driver will get a callback after we've given up
 on the major number.

 In theory this means this major number can get reused
 by something else, resulting in a conflict.

 Yes, there is a tiny race window.

Applied.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 2/4] virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr

2014-10-27 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not
 really useful since virtio_net_hdr_mrg_rxbuf includes that as the first
 field anyway.

 Let's drop it, precalculate header len and store within vi instead.

 This way we can also remove struct skb_vnet_hdr.

Yes, this is definitely a win.

Acked-by: Rusty Russell ru...@rustcorp.com.au

Thanks,
Rusty.


 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/net/virtio_net.c | 88 
 ++--
  1 file changed, 40 insertions(+), 48 deletions(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 36f3dfc..a795a23 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -123,6 +123,9 @@ struct virtnet_info {
   /* Host can handle any s/g split between our header and packet data */
   bool any_header_sg;
  
 + /* Packet virtio header size */
 + u8 hdr_len;
 +
   /* Active statistics */
   struct virtnet_stats __percpu *stats;
  
 @@ -139,21 +142,14 @@ struct virtnet_info {
   struct notifier_block nb;
  };
  
 -struct skb_vnet_hdr {
 - union {
 - struct virtio_net_hdr hdr;
 - struct virtio_net_hdr_mrg_rxbuf mhdr;
 - };
 -};
 -
  struct padded_vnet_hdr {
 - struct virtio_net_hdr hdr;
 + struct virtio_net_hdr_mrg_rxbuf hdr;
   /*
 -  * virtio_net_hdr should be in a separated sg buffer because of a
 -  * QEMU bug, and data sg buffer shares same page with this header sg.
 -  * This padding makes next sg 16 byte aligned after virtio_net_hdr.
 +  * hdr is in a separate sg buffer, and data sg buffer shares same page
 +  * with this header sg. This padding makes next sg 16 byte aligned
 +  * after the header.
*/
 - char padding[6];
 + char padding[4];
  };
  
  /* Converting between virtqueue no. and kernel tx/rx queue no.
 @@ -179,9 +175,9 @@ static int rxq2vq(int rxq)
   return rxq * 2;
  }
  
 -static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
 +static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff 
 *skb)
  {
 - return (struct skb_vnet_hdr *)skb-cb;
 + return (struct virtio_net_hdr_mrg_rxbuf *)skb-cb;
  }
  
  /*
 @@ -247,7 +243,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
 *vi,
  unsigned int len, unsigned int truesize)
  {
   struct sk_buff *skb;
 - struct skb_vnet_hdr *hdr;
 + struct virtio_net_hdr_mrg_rxbuf *hdr;
   unsigned int copy, hdr_len, hdr_padded_len;
   char *p;
  
 @@ -260,13 +256,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
 *vi,
  
   hdr = skb_vnet_hdr(skb);
  
 - if (vi-mergeable_rx_bufs) {
 - hdr_len = sizeof hdr-mhdr;
 - hdr_padded_len = sizeof hdr-mhdr;
 - } else {
 - hdr_len = sizeof hdr-hdr;
 + hdr_len = vi-hdr_len;
 + if (vi-mergeable_rx_bufs)
 + hdr_padded_len = sizeof *hdr;
 + else
   hdr_padded_len = sizeof(struct padded_vnet_hdr);
 - }
  
   memcpy(hdr, p, hdr_len);
  
 @@ -317,11 +311,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
 *vi,
   return skb;
  }
  
 -static struct sk_buff *receive_small(void *buf, unsigned int len)
 +static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, 
 unsigned int len)
  {
   struct sk_buff * skb = buf;
  
 - len -= sizeof(struct virtio_net_hdr);
 + len -= vi-hdr_len;
   skb_trim(skb, len);
  
   return skb;
 @@ -354,8 +348,8 @@ static struct sk_buff *receive_mergeable(struct 
 net_device *dev,
unsigned int len)
  {
   void *buf = mergeable_ctx_to_buf_address(ctx);
 - struct skb_vnet_hdr *hdr = buf;
 - u16 num_buf = virtio16_to_cpu(rq-vq-vdev, hdr-mhdr.num_buffers);
 + struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 + u16 num_buf = virtio16_to_cpu(vi-vdev, hdr-num_buffers);
   struct page *page = virt_to_head_page(buf);
   int offset = buf - page_address(page);
   unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
 @@ -373,8 +367,8 @@ static struct sk_buff *receive_mergeable(struct 
 net_device *dev,
   if (unlikely(!ctx)) {
   pr_debug(%s: rx error: %d buffers out of %d missing\n,
dev-name, num_buf,
 -  virtio16_to_cpu(rq-vq-vdev,
 -  hdr-mhdr.num_buffers));
 +  virtio16_to_cpu(vi-vdev,
 +  hdr-num_buffers));
   dev-stats.rx_length_errors++;
   goto err_buf;
   }
 @@ -441,7 +435,7 @@ static void receive_buf(struct virtnet_info *vi, struct 
 receive_queue *rq,
   struct net_device *dev = vi-dev;
   struct

Re: [PATCH RFC 4/4] virtio_net: bigger header when VERSION_1 is set

2014-10-27 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 With VERSION_1 virtio_net uses same header size
 whether mergeable buffers are enabled or not.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

These two are great too, thanks:

Acked-by: Rusty Russell ru...@rustcorp.com.au

Cheers,
Rusty.

 ---
  drivers/net/virtio_net.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 9c6d50f..a2fe340 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -1764,7 +1764,8 @@ static int virtnet_probe(struct virtio_device *vdev)
   if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
   vi-mergeable_rx_bufs = true;
  
 - if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
 + if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
 + virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
   vi-hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
   else
   vi-hdr_len = sizeof(struct virtio_net_hdr);
 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.

2014-10-19 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:
 We got a warning in boot stage when above set_current_rng() is executed,
 it can be fixed by init rng-ref in hwrng_init().


 @@ -166,6 +169,8 @@ static inline int hwrng_init(struct hwrng *rng)
 if (current_quality  0  !hwrng_fill)
 start_khwrngd();
  
 +   kref_init(rng-ref);
 +
 return 0;
  }

OK, I folded this fix on.

Thanks,
Rusty.

hw_random: use reference counts on each struct hwrng.

current_rng holds one reference, and we bump it every time we want
to do a read from it.

This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.

Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.

This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

v3: initialize kref (thanks Amos Kong)
v2: fix missing put_rng() on exit path (thanks Amos Kong)
Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index a0905c818e13..0ecac38da954 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
 #include linux/delay.h
 #include linux/slab.h
 #include linux/random.h
+#include linux/err.h
 #include asm/uaccess.h
 
 
@@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
add_device_randomness(bytes, bytes_read);
 }
 
+static inline void cleanup_rng(struct kref *kref)
+{
+   struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+   if (rng-cleanup)
+   rng-cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+   BUG_ON(!mutex_is_locked(rng_mutex));
+   kref_get(rng-ref);
+   current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+   BUG_ON(!mutex_is_locked(rng_mutex));
+   if (!current_rng)
+   return;
+
+   kref_put(current_rng-ref, cleanup_rng);
+   current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+   struct hwrng *rng;
+
+   if (mutex_lock_interruptible(rng_mutex))
+   return ERR_PTR(-ERESTARTSYS);
+
+   rng = current_rng;
+   if (rng)
+   kref_get(rng-ref);
+
+   mutex_unlock(rng_mutex);
+   return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+   /*
+* Hold rng_mutex here so we serialize in case they set_current_rng
+* on rng again immediately.
+*/
+   mutex_lock(rng_mutex);
+   if (rng)
+   kref_put(rng-ref, cleanup_rng);
+   mutex_unlock(rng_mutex);
+}
+
 static inline int hwrng_init(struct hwrng *rng)
 {
if (rng-init) {
@@ -110,13 +164,9 @@ static inline int hwrng_init(struct hwrng *rng)
if (current_quality  0  !hwrng_fill)
start_khwrngd();
 
-   return 0;
-}
+   kref_init(rng-ref);
 
-static inline void hwrng_cleanup(struct hwrng *rng)
-{
-   if (rng  rng-cleanup)
-   rng-cleanup(rng);
+   return 0;
 }
 
 static int rng_dev_open(struct inode *inode, struct file *filp)
@@ -154,21 +204,22 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
ssize_t ret = 0;
int err = 0;
int bytes_read, len;
+   struct hwrng *rng;
 
while (size) {
-   if (mutex_lock_interruptible(rng_mutex)) {
-   err = -ERESTARTSYS;
+   rng = get_current_rng();
+   if (IS_ERR(rng)) {
+   err = PTR_ERR(rng);
goto out;
}
-
-   if (!current_rng) {
+   if (!rng) {
err = -ENODEV;
-   goto out_unlock;
+   goto out;
}
 
mutex_lock(reading_mutex);
if (!data_avail) {
-   bytes_read = rng_get_data(current_rng, rng_buffer,
+   bytes_read = rng_get_data(rng, rng_buffer,
rng_buffer_size(),
!(filp-f_flags  O_NONBLOCK));
if (bytes_read  0) {
@@ -200,8 +251,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
ret += len;
}
 
-   mutex_unlock(rng_mutex);
mutex_unlock(reading_mutex);
+   put_rng(rng);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -213,12 +264,11 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
}
 out:
return ret ? : err;
-out_unlock:
-   mutex_unlock(rng_mutex);
-   goto out;
+
 out_unlock_reading:
mutex_unlock(reading_mutex);
-   goto out_unlock

Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.

2014-10-19 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:
 From: Rusty Russell ru...@rustcorp.com.au

 current_rng holds one reference, and we bump it every time we want
 to do a read from it.

 This means we only hold the rng_mutex to grab or drop a reference,
 so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
 block on read of /dev/hwrng.

 Using a kref is overkill (we're always under the rng_mutex), but
 a standard pattern.

 This also solves the problem that the hwrng_fillfn thread was
 accessing current_rng without a lock, which could change (eg. to NULL)
 underneath it.

 V2: reduce reference count in signal_pending path

OK, I changed it to do the put_rng before the check, so instead of:

 @@ -208,17 +256,19 @@ static ssize_t rng_dev_read(struct file *filp, char 
 __user *buf,
  
   if (signal_pending(current)) {
   err = -ERESTARTSYS;
 + put_rng(rng);
   goto out;
   }
 +
 + put_rng(rng);
   }
  out:
   return ret ? : err;
 -out_unlock:
 - mutex_unlock(rng_mutex);
 - goto out;
 +
  out_unlock_reading:
   mutex_unlock(reading_mutex);
 - goto out_unlock;
 + put_rng(rng);
 + goto out;
  }

We have:


mutex_unlock(reading_mutex);
put_rng(rng);

if (need_resched())
schedule_timeout_interruptible(1);

if (signal_pending(current)) {
err = -ERESTARTSYS;
goto out;
}
}
out:
return ret ? : err;

out_unlock_reading:
mutex_unlock(reading_mutex);
put_rng(rng);
goto out;
}


Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from balloon on OOM

2014-10-19 Thread Rusty Russell
Denis V. Lunev d...@openvz.org writes:

 From: Raushaniya Maksudova rmaksud...@parallels.com

 Excessive virtio_balloon inflation can cause invocation of OOM-killer,
 when Linux is under severe memory pressure. Various mechanisms are
 responsible for correct virtio_balloon memory management. Nevertheless
 it is often the case that these control tools does not have enough time
 to react on fast changing memory load. As a result OS runs out of memory
 and invokes OOM-killer. The balancing of memory by use of the virtio
 balloon should not cause the termination of processes while there are
 pages in the balloon. Now there is no way for virtio balloon driver to
 free some memory at the last moment before some process will be get
 killed by OOM-killer.

 This does not provide a security breach as balloon itself is running
 inside guest OS and is working in the cooperation with the host. Thus
 some improvements from guest side should be considered as normal.

 To solve the problem, introduce a virtio_balloon callback which is
 expected to be called from the oom notifier call chain in out_of_memory()
 function. If virtio balloon could release some memory, it will make
 the system to return and retry the allocation that forced the out of
 memory killer to run.

 Allocate virtio  feature bit for this: it is not set by default,
 the the guest will not deflate virtio balloon on OOM without explicit
 permission from host.

 Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Rusty Russell ru...@rustcorp.com.au
 CC: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/virtio/virtio_balloon.c | 52 
 +
  include/uapi/linux/virtio_balloon.h |  1 +
  2 files changed, 53 insertions(+)

 diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
 index 66cac10..88d73a0 100644
 --- a/drivers/virtio/virtio_balloon.c
 +++ b/drivers/virtio/virtio_balloon.c
 @@ -28,6 +28,7 @@
  #include linux/slab.h
  #include linux/module.h
  #include linux/balloon_compaction.h
 +#include linux/oom.h
  
  /*
   * Balloon device works in 4K page units.  So each page is pointed to by
 @@ -36,6 +37,12 @@
   */
  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE  
 VIRTIO_BALLOON_PFN_SHIFT)
  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 +#define OOM_VBALLOON_DEFAULT_PAGES 256
 +#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 +
 +static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 +module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 +MODULE_PARM_DESC(oom_pages, pages to free on OOM);
  
  struct virtio_balloon
  {
 @@ -71,6 +78,9 @@ struct virtio_balloon
   /* Memory statistics */
   int need_stats_update;
   struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
 +
 + /* To register callback in oom notifier call chain */
 + struct notifier_block nb;
  };
  
  static struct virtio_device_id id_table[] = {
 @@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon 
 *vb)
 actual);
  }
  
 +/*
 + * virtballoon_oom_notify - release pages when system is under severe
 + *   memory pressure (called from out_of_memory())
 + * @self : notifier block struct
 + * @dummy: not used
 + * @parm : returned - number of freed pages
 + *
 + * The balancing of memory by use of the virtio balloon should not cause
 + * the termination of processes while there are pages in the balloon.
 + * If virtio balloon manages to release some memory, it will make the
 + * system return and retry the allocation that forced the OOM killer
 + * to run.
 + */
 +static int virtballoon_oom_notify(struct notifier_block *self,
 +   unsigned long dummy, void *parm)
 +{
 + struct virtio_balloon *vb;
 + unsigned long *freed;
 + unsigned num_freed_pages;
 +
 + vb = container_of(self, struct virtio_balloon, nb);
 + if (!virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 + return NOTIFY_OK;
 +
 + freed = parm;
 + num_freed_pages = leak_balloon(vb, oom_pages);
 + update_balloon_size(vb);
 + *freed += num_freed_pages;
 +
 + return NOTIFY_OK;
 +}
 +
  static int balloon(void *_vballoon)
  {
   struct virtio_balloon *vb = _vballoon;

OK, I applied these.  I thought it a bit weird that you're checking the
feature in the callback, rather than only registering the callback
if the feature exists.  But it's clear enough.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PULL] More virtio fun

2014-10-16 Thread Rusty Russell
The following changes since commit 7ec62d421bdf29cb31101ae2689f7f3a9906289a:

  Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2014-09-10 
14:04:17 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git 
tags/virtio-next-for-linus

for you to fetch changes up to 1bbc26062754b012656d34103215f7552e02b999:

  virtio-rng: refactor probe error handling (2014-10-15 10:25:14 +1030)


One cc: stable commit, the rest are a series of minor cleanups which have
been sitting in MST's tree during my vacation.  I changed a function name
and made one trivial change, then they spent two days in linux-next.

Thanks,
Rusty.


Michael S. Tsirkin (24):
  virtio_pci: fix virtio spec compliance on restore
  virtio: unify config_changed handling
  virtio-pci: move freeze/restore to virtio core
  virtio: defer config changed notifications
  virtio_blk: drop config_enable
  virtio-blk: drop config_mutex
  virtio_net: drop config_enable
  virtio-net: drop config_mutex
  virtio_net: minor cleanup
  virtio: add API to enable VQs early
  virtio_net: enable VQs early
  virtio_blk: enable VQs early
  virtio_console: enable VQs early
  9p/trans_virtio: enable VQs early
  virtio_net: fix use after free on allocation failure
  virtio_scsi: move kick event out from virtscsi_init
  virtio_blk: enable VQs early on restore
  virtio_scsi: enable VQs early on restore
  virtio_console: enable VQs early on restore
  virtio_net: enable VQs early on restore
  virtio_scsi: fix race on device removal
  virtio_balloon: enable VQs early on restore
  virtio_scsi: drop scan callback
  virtio-rng: refactor probe error handling

Paolo Bonzini (1):
  virito_scsi: use freezable WQ for events

 drivers/block/virtio_blk.c  |  40 --
 drivers/char/hw_random/virtio-rng.c |  15 +++---
 drivers/char/virtio_console.c   |   4 ++
 drivers/misc/mic/card/mic_virtio.c  |   6 +--
 drivers/net/virtio_net.c|  44 +--
 drivers/s390/kvm/kvm_virtio.c   |   9 +---
 drivers/s390/kvm/virtio_ccw.c   |   6 +--
 drivers/scsi/virtio_scsi.c  |  42 +--
 drivers/virtio/virtio.c | 103 
 drivers/virtio/virtio_balloon.c |   2 +
 drivers/virtio/virtio_mmio.c|   7 +--
 drivers/virtio/virtio_pci.c |  33 ++--
 include/linux/virtio.h  |  14 +
 include/linux/virtio_config.h   |  17 ++
 net/9p/trans_virtio.c   |   2 +
 15 files changed, 207 insertions(+), 137 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors

2014-10-15 Thread Rusty Russell
Jason Wang jasow...@redhat.com writes:
 Below should be useful for some experiments Jason is doing.
 I thought I'd send it out for early review/feedback.

 event idx feature allows us to defer interrupts until
 a specific # of descriptors were used.
 Sometimes it might be useful to get an interrupt after
 a specific descriptor, regardless.
 This adds a descriptor flag for this, and an API
 to create an urgent output descriptor.
 This is still an RFC:
 we'll need a feature bit for drivers to detect this,
 but we've run out of feature bits for virtio 0.X.
 For experimentation purposes, drivers can assume
 this is set, or add a driver-specific feature bit.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-15 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:
  Denis V. Lunev d...@parallels.com writes:
   From: Raushaniya Maksudova rmaksud...@parallels.com
  
   Excessive virtio_balloon inflation can cause invocation of OOM-killer,
   when Linux is under severe memory pressure. Various mechanisms are
   responsible for correct virtio_balloon memory management. Nevertheless
   it is often the case that these control tools does not have enough time
   to react on fast changing memory load. As a result OS runs out of memory
   and invokes OOM-killer. The balancing of memory by use of the virtio
   balloon should not cause the termination of processes while there are
   pages in the balloon. Now there is no way for virtio balloon driver to
   free some memory at the last moment before some process will be get
   killed by OOM-killer.
  
  This makes some amount of sense.
 
  This reminds me of the balloon fs that Google once proposed.
  This really needs to be controlled from host though.
  At the moment host does not expect guest to deflate before
  requests.
  So as a minimum, add a feature bit for this.  what if you want a mix of
  mandatory and optional balooning? I guess we can use multiple balloons,
  is that the idea?
 
 Trying to claw back some pages on OOM is almost certainly correct,
 even if the host doesn't expect it.  It's roughly equivalent to not
 giving up pages in the first place.

 Well the difference is that there are management tools that
 poll balloon in host until they see balloon size reaches
 the expected value.

 They don't expect balloon to shrink below num_pages and will respond in 
 various
 unexpected ways like e.g. killing the VM if it does.
 Killing a userspace process within the guest might be better
 for VM health.

 Besides the fact that we always did it like this, these tools seem to have
 basis in the spec.
 Specifically, this is based on this text from the spec:
   the device asks for a certain amount of memory, and the driver
   supplies it (or withdraws it, if the device has more than it asks for).
   This allows the guest to adapt to changes in allowance of underlying
   physical memory.

 and

   The device is driven by the receipt of a configuration change interrupt.



 Cheers,
 Rusty.
 PS.  Yes, a real guest-driven balloon is preferable, but that's a much
  larger task.


 Any objection to making the feature depend on a feature flag?

If you believe a guest which does this will cause drastic failure on the
host side (ie. killing the VM), then yes, we can do this.

However, I'm not aware of anything that sophisticated...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_balloon: Convert vballon kthread into a workqueue

2014-10-15 Thread Rusty Russell
Petr Mladek pmla...@suse.cz writes:
 Workqueues have clean and rich API for all basic operations. The code is 
 usually
 easier and better readable. It can be easily tuned for the given purpose.

OK, sure.

 -static void fill_balloon(struct virtio_balloon *vb, size_t num)
 +static void fill_balloon(struct virtio_balloon *vb, size_t diff)
  {
   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 + size_t num;
 + bool done;
  
   /* We can only do one array worth at a time. */
 - num = min(num, ARRAY_SIZE(vb-pfns));
 + num = min(diff, ARRAY_SIZE(vb-pfns));
 + done = (num == diff) ? true : false;
  
   mutex_lock(vb-balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
 @@ -143,6 +143,7 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
VIRTIO_BALLOON_PAGES_PER_PAGE);
   /* Sleep for at least 1/5 of a second before retry. */
   msleep(200);
 + done = false;
   break;
   }
   set_page_pfns(vb-pfns + vb-num_pfns, page);
 @@ -154,6 +155,9 @@ static void fill_balloon(struct virtio_balloon *vb, 
 size_t num)
   if (vb-num_pfns != 0)
   tell_host(vb, vb-inflate_vq);
   mutex_unlock(vb-balloon_lock);
 +
 + if (!done)
 + queue_work(vb-wq, vb-wq_work);
  }

Hmm, this is a bit convoluted.  I would spell it out by keeping a
num_done counter:

if (num_done  diff)
queue_work(vb-wq, vb-wq_work);

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -168,20 +172,25 @@ static void release_pages_by_pfn(const u32 pfns[], 
 unsigned int num)
   }
  }
  
 -static void leak_balloon(struct virtio_balloon *vb, size_t num)
 +static void leak_balloon(struct virtio_balloon *vb, size_t diff)
  {
   struct page *page;
   struct balloon_dev_info *vb_dev_info = vb-vb_dev_info;
 + size_t num;
 + bool done;
  
   /* We can only do one array worth at a time. */
 - num = min(num, ARRAY_SIZE(vb-pfns));
 + num = min(diff, ARRAY_SIZE(vb-pfns));
 + done = (num == diff) ? true : false;
  
   mutex_lock(vb-balloon_lock);
   for (vb-num_pfns = 0; vb-num_pfns  num;
vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
   page = balloon_page_dequeue(vb_dev_info);
 - if (!page)
 + if (!page) {
 + done = false;
   break;
 + }
   set_page_pfns(vb-pfns + vb-num_pfns, page);
   vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
   }
 @@ -195,6 +204,9 @@ static void leak_balloon(struct virtio_balloon *vb, 
 size_t num)
   tell_host(vb, vb-deflate_vq);
   mutex_unlock(vb-balloon_lock);
   release_pages_by_pfn(vb-pfns, vb-num_pfns);
 +
 + if (!done)
 + queue_work(vb-wq, vb-wq_work);
  }

Here too.

 @@ -471,11 +469,13 @@ static int virtballoon_probe(struct virtio_device *vdev)
   if (err)
   goto out_free_vb_mapping;
  
 - vb-thread = kthread_run(balloon, vb, vballoon);
 - if (IS_ERR(vb-thread)) {
 - err = PTR_ERR(vb-thread);
 + vb-wq = alloc_workqueue(vballoon_wq,
 +  WQ_FREEZABLE | WQ_MEM_RECLAIM, 0);
 + if (!vb-wq) {
 + err = -ENOMEM;
   goto out_del_vqs;
   }
 + INIT_WORK(vb-wq_work, balloon);

That looks racy to me; shouldn't we init vq-work before allocating wq?

Otherwise, looks good!

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 04/25] virtio: defer config changed notifications

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 Defer config changed notifications that arrive during
 probe/scan/freeze/restore.

 This will allow drivers to set DRIVER_OK earlier, without worrying about
 racing with config change interrupts.

 This change will also benefit old hypervisors (before 2009)
 that send interrupts without checking DRIVER_OK: previously,
 the callback could race with driver-specific initialization.

 This will also help simplify drivers.

But AFAICT you never *read* dev-config_changed.

You unconditionally trigger a config_changed event in
virtio_config_enable().  That's a bit weird, but probably OK.

How about the following change (on top of your patch).  I
think the renaming is clearer, and note the added if() test in
virtio_config_enable().

If you approve, I'll fold it in.

Cheers,
Rusty.

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 2536701b098b..df598dd8c5c8 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device 
*dev)
struct virtio_driver *drv = drv_to_virtio(dev-dev.driver);
 
if (!dev-config_enabled)
-   dev-config_changed = true;
+   dev-config_change_pending = true;
else if (drv  drv-config_changed)
drv-config_changed(dev);
 }
@@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device *dev)
 {
spin_lock_irq(dev-config_lock);
dev-config_enabled = true;
-   __virtio_config_changed(dev);
-   dev-config_changed = false;
+   if (dev-config_change_pending)
+   __virtio_config_changed(dev);
+   dev-config_change_pending = false;
spin_unlock_irq(dev-config_lock);
 }
 
@@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev)
 
spin_lock_init(dev-config_lock);
dev-config_enabled = false;
-   dev-config_changed = false;
+   dev-config_change_pending = false;
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5636b119dc25..65261a7244fc 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * @index: unique position on the virtio bus
  * @failed: saved value for CONFIG_S_FAILED bit (for restore)
  * @config_enabled: configuration change reporting enabled
- * @config_changed: configuration change reported while disabled
+ * @config_change_pending: configuration change reported while disabled
  * @config_lock: protects configuration change reporting
  * @dev: underlying device.
  * @id: the device type identification (used to match it with a driver).
@@ -94,7 +94,7 @@ struct virtio_device {
int index;
bool failed;
bool config_enabled;
-   bool config_changed;
+   bool config_change_pending;
spinlock_t config_lock;
struct device dev;
struct virtio_device_id id;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio_net: pass well-formed sgs to virtqueue_add_*()

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Fri, Sep 05, 2014 at 12:40:50PM +0200, Paolo Bonzini wrote:
 Il 03/09/2014 06:29, Rusty Russell ha scritto:
  +  sg_init_table(rq-sg, MAX_SKB_FRAGS + 2);
 
 I think 2 is enough here.  That said...
 
 sg_set_buf(rq-sg, hdr-hdr, sizeof hdr-hdr);
  -
 skb_to_sgvec(skb, rq-sg + 1, 0, skb-len);
   
 err = virtqueue_add_inbuf(rq-vq, rq-sg, 2, skb, gfp);
 
 ... skb_to_sgvec will already make the sg well formed, so the
 sg_init_table is _almost_ redundant; it is only there to remove
 intermediate end marks.  The block layer takes care to remove
 them, but skb_to_sgvec doesn't.

sg_init_table is still needed if CONFIG_DEBUG_SG, so I don't
think it's worth it.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-14 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote:
 Denis V. Lunev d...@parallels.com writes:
  From: Raushaniya Maksudova rmaksud...@parallels.com
 
  Excessive virtio_balloon inflation can cause invocation of OOM-killer,
  when Linux is under severe memory pressure. Various mechanisms are
  responsible for correct virtio_balloon memory management. Nevertheless
  it is often the case that these control tools does not have enough time
  to react on fast changing memory load. As a result OS runs out of memory
  and invokes OOM-killer. The balancing of memory by use of the virtio
  balloon should not cause the termination of processes while there are
  pages in the balloon. Now there is no way for virtio balloon driver to
  free some memory at the last moment before some process will be get
  killed by OOM-killer.
 
 This makes some amount of sense.

 This reminds me of the balloon fs that Google once proposed.
 This really needs to be controlled from host though.
 At the moment host does not expect guest to deflate before
 requests.
 So as a minimum, add a feature bit for this.  what if you want a mix of
 mandatory and optional balooning? I guess we can use multiple balloons,
 is that the idea?

Trying to claw back some pages on OOM is almost certainly correct,
even if the host doesn't expect it.  It's roughly equivalent to not
giving up pages in the first place.

Cheers,
Rusty.
PS.  Yes, a real guest-driven balloon is preferable, but that's a much
 larger task.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 08/11] virtio_blk: use virtio v1.0 endian

2014-10-13 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 Note that we care only about the fields still in use for virtio v1.0.

 Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
 Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com

Hi Cornelia,

These patches all look good; I'm a bit nervous about our testing
missing some conversion, so we'll need qemu patches for PCI so we can
test on other platforms too.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 03/11] virtio: support more feature bits

2014-10-13 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 With virtio-1, we support more than 32 feature bits. Let's make
 vdev-guest_features depend on the number of supported feature bits,
 allowing us to grow the feature bits automatically.

It's a judgement call, but I would say that simply using uint64_t
will be sufficient for quite a while.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM

2014-10-13 Thread Rusty Russell
Denis V. Lunev d...@parallels.com writes:
 From: Raushaniya Maksudova rmaksud...@parallels.com

 Excessive virtio_balloon inflation can cause invocation of OOM-killer,
 when Linux is under severe memory pressure. Various mechanisms are
 responsible for correct virtio_balloon memory management. Nevertheless
 it is often the case that these control tools does not have enough time
 to react on fast changing memory load. As a result OS runs out of memory
 and invokes OOM-killer. The balancing of memory by use of the virtio
 balloon should not cause the termination of processes while there are
 pages in the balloon. Now there is no way for virtio balloon driver to
 free some memory at the last moment before some process will be get
 killed by OOM-killer.

This makes some amount of sense.

But I suggest a few minor changes:

 +static int oom_vballoon_pages = OOM_VBALLOON_DEFAULT_PAGES;
 +module_param(oom_vballoon_pages, int, S_IRUSR | S_IWUSR);
 +MODULE_PARM_DESC(oom_vballoon_pages, pages to free on OOM);

Since this is already prefixed with virtio_balloon. I suggest just
calling it oom_pages.

 +static int virtballoon_oom_notify(struct notifier_block *self,
 +   unsigned long dummy, void *parm)
 +{
 + unsigned int num_freed_pages;
 + unsigned long *freed = (unsigned long *)parm;
 + struct virtio_balloon *vb = container_of((struct notifier_block *)self,
 +  struct virtio_balloon, nb);

Why cast self here?

 + num_freed_pages = leak_balloon(vb, oom_vballoon_pages);
 + update_balloon_size(vb);
 + *freed += num_freed_pages;
 +
 + return NOTIFY_OK;
 +}

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: BUG_ON in virtio-ring.c

2014-10-13 Thread Rusty Russell
Alexey Lapitsky lex.pub...@gmail.com writes:
 Hi,

 I'm hitting this bug with both ext4 and btrfs.

 Here's an example of the backtrace:
 https://gist.github.com/vzctl/e888a821333979120932

 I tried raising this BUG only for direct ring and it solved the problem:

  -   BUG_ON(total_sg  vq-vring.num);
 +   BUG_ON(total_sg  vq-vring.num  !vq-indirect);

 Shall I submit the patch or is a more elaborate fix required?

This is wrong.  It looks like the block layer is spending down
more sg elements than we have ring entries, yet we tell it not to:

blk_queue_max_segments(q, vblk-sg_elems-2);

If you apply this patch, what happens?  Here it prints:

[0.616564] virtqueue elements = 128, max_segments = 126 (1 queues)
[0.621244]  vda: vda1 vda2  vda5 
[0.632290] virtqueue elements = 128, max_segments = 126 (1 queues)
[0.683526]  vdb: vdb1 vdb2  vdb5 

Cheers,
Rusty.

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a581400de0f..aa9d4d313587 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev)
/* We can handle whatever the host told us to handle. */
blk_queue_max_segments(q, vblk-sg_elems-2);
 
+   printk(virtqueue elements = %u, max_segments = %u (%u queues), 
+  virtqueue_get_vring_size(vblk-vqs[0].vq),
+  vblk-sg_elems-2,
+  vblk-num_vqs);
+
+   BUG_ON(vblk-sg_elems-2  virtqueue_get_vring_size(vblk-vqs[0].vq));
+
/* No need to bounce any requests */
blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 10/25] virtio: add API to enable VQs early

2014-10-13 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 virtio spec 0.9.X requires DRIVER_OK to be set before
 VQs are used, but some drivers use VQs before probe
 function returns.
 Since DRIVER_OK is set after probe, this violates the spec.

 Even though under virtio 1.0 transitional devices support this
 behaviour, we want to make it possible for those early callers to become
 spec compliant and eventually support non-transitional devices.

 Add API for drivers to call before using VQs.

 Sets DRIVER_OK internally.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

OK, this all looks good, but I think we should do two things:

1) rename virtio_enable_vqs_early() to virtio_device_ready().
2) Add a BUG_ON in the virtio_ring code to make sure device is ready
   before any ops are called.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: pci: Use SIMPLE_DEV_PM_OPS macro

2014-09-23 Thread Rusty Russell
Jingoo Han jg1@samsung.com writes:
 On Tuesday, September 09, 2014 9:14 AM, Rusty Russell wrote:
 Jingoo Han jg1@samsung.com writes:
  Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
 
  Signed-off-by: Jingoo Han jg1@samsung.com
 
 This patch is obviously wrong.  It won't compile without
 CONFIG_PM_SLEEP.

 No, there is no compile issue.
 When, CONFIG_PM_SLEEP=n, there is no build error.

My mistake.  Thanks, I've applied it.  It probably won't go in until the
next merge window, however, since I'm travelling for this one.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

2014-09-17 Thread Rusty Russell
Amos Kong ak...@redhat.com writes:

 I started a QEMU (non-smp) guest with one virtio-rng device, and read
 random data from /dev/hwrng by dd:

  # dd if=/dev/hwrng of=/dev/null 

 In the same time, if I check hwrng attributes from sysfs by cat:

  # cat /sys/class/misc/hw_random/rng_*

 The cat process always gets stuck with slow backend (5 k/s), if we
 use a quick backend (1.2 M/s), the cat process will cost 1 to 2
 minutes. The stuck doesn't exist for smp guest.

 Reading syscall enters kernel and call rng_dev_read(), it's user
 context. We used need_resched() to check if other tasks need to
 be run, but it almost always return false, and re-hold the mutex
 lock. The attributes accessing process always fails to hold the
 lock, so the cat gets stuck.

 User context doesn't allow other user contexts run on that CPU,
 unless the kernel code sleeps for some reason. This is why the
 need_reshed() always return false here.

 This patch removed need_resched() and always schedule other tasks
 then other tasks can have chance to hold the lock and execute
 protected code.

OK, this is going to be a rant.

Your explanation doesn't make sense at all.  Worse, your solution breaks
the advice of Kernighan  Plaugher: Don't patch bad code - rewrite
it..

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests.  For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() fix probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday.  That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr
Rusty.

 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  drivers/char/hw_random/core.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
 index c591d7e..263a370 100644
 --- a/drivers/char/hw_random/core.c
 +++ b/drivers/char/hw_random/core.c
 @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
 __user *buf,
  
   mutex_unlock(rng_mutex);
  
 - if (need_resched())
 - schedule_timeout_interruptible(1);
 + schedule_timeout_interruptible(1);
  
   if (signal_pending(current)) {
   err = -ERESTARTSYS;
 -- 
 1.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 5/5] hw_random: don't init list element we're about to add to list.

2014-09-17 Thread Rusty Russell
Another interesting anti-pattern.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 6a34feca6b43..96fa06716e95 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng)
goto out_unlock;
}
}
-   INIT_LIST_HEAD(rng-list);
list_add_tail(rng-list, rng_list);
 
if (old_rng  !rng-init) {
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/5] hw_random: fix unregister race.

2014-09-17 Thread Rusty Russell
The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered.  Add a wait for zero
in the hwrng_unregister path.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/char/hw_random/core.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index dc9092a1075d..b4a21e9521cf 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
 static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
 static unsigned short current_quality;
 static unsigned short default_quality; /* = 0; default to off */
 
@@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
 
if (rng-cleanup)
rng-cleanup(rng);
+   wake_up_all(rng_done);
 }
 
 static void set_current_rng(struct hwrng *rng)
@@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
}
 
mutex_unlock(rng_mutex);
+
+   /* Just in case rng is reading right now, wait. */
+   wait_event(rng_done, atomic_read(rng-ref.refcount) == 0);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
 
-- 
1.9.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   6   7   8   9   10   >