Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors

2014-10-14 Thread Rusty Russell
Jason Wang  writes:
> Below should be useful for some experiments Jason is doing.
> I thought I'd send it out for early review/feedback.
>
> event idx feature allows us to defer interrupts until
> a specific # of descriptors were used.
> Sometimes it might be useful to get an interrupt after
> a specific descriptor, regardless.
> This adds a descriptor flag for this, and an API
> to create an urgent output descriptor.
> This is still an RFC:
> we'll need a feature bit for drivers to detect this,
> but we've run out of feature bits for virtio 0.X.
> For experimentation purposes, drivers can assume
> this is set, or add a driver-specific feature bit.
>
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 

The new VRING_DESC_F_URGENT bit is theoretically nicer, but for
networking (which tends to take packets in order) couldn't we just set
the event counter to give us a tx interrupt at the packet we want?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
> On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
>> > We free transmitted packets in ndo_start_xmit() in the past to get better
>> > performance in the past. One side effect is that skb_orphan() needs to be
>> > called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
>> > fact. For TCP protocol, this means several optimization could not work well
>> > such as TCP small queue and auto corking. This can lead extra low
>> > throughput of small packets stream.
>> > 
>> > Thanks to the urgent descriptor support. This patch tries to solve this
>> > issue by enable the tx interrupt selectively for stream packets. This means
>> > we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
>> > tx interrupt for those packets. After we get tx interrupt, a tx napi was
>> > scheduled to free those packets.
>> > 
>> > With this method, sk_wmem_alloc of TCP socket were more accurate than in
>> > the past which let TCP can batch more through TSQ and auto corking.
>> > 
>> > Signed-off-by: Jason Wang 
>> > ---
>> >  drivers/net/virtio_net.c | 164 
>> > ---
>> >  1 file changed, 128 insertions(+), 36 deletions(-)
>> > 
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index 5810841..b450fc4 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -72,6 +72,8 @@ struct send_queue {
>> >  
>> >/* Name of the send queue: output.$index */
>> >char name[40];
>> > +
>> > +  struct napi_struct napi;
>> >  };
>> >  
>> >  /* Internal representation of a receive virtqueue */
>> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
>> > *rq, gfp_t gfp_mask)
>> >return p;
>> >  }
>> >  
>> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> > +{
>> > +  struct sk_buff *skb;
>> > +  unsigned int len;
>> > +  struct virtnet_info *vi = sq->vq->vdev->priv;
>> > +  struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > +  int sent = 0;
>> > +
>> > +  while (sent < budget &&
>> > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > +  pr_debug("Sent skb %p\n", skb);
>> > +
>> > +  u64_stats_update_begin(&stats->tx_syncp);
>> > +  stats->tx_bytes += skb->len;
>> > +  stats->tx_packets++;
>> > +  u64_stats_update_end(&stats->tx_syncp);
>> > +
>> > +  dev_kfree_skb_any(skb);
>> > +  sent++;
>> > +  }
>> > +
>> > +  return sent;
>> > +}
>> > +
>> >  static void skb_xmit_done(struct virtqueue *vq)
>> >  {
>> >struct virtnet_info *vi = vq->vdev->priv;
>> > +  struct send_queue *sq = &vi->sq[vq2txq(vq)];
>> >  
>> > -  /* Suppress further interrupts. */
>> > -  virtqueue_disable_cb(vq);
>> > -
>> > -  /* We were probably waiting for more output buffers. */
>> > -  netif_wake_subqueue(vi->dev, vq2txq(vq));
>> > +  if (napi_schedule_prep(&sq->napi)) {
>> > +  virtqueue_disable_cb(vq);
>> > +  virtqueue_disable_cb_urgent(vq);
> This disable_cb is no longer safe in xmit_done callback,
> since queue can be running at the same time.
>
> You must do it under tx lock. And yes, this likely will not work
> work well without event_idx. We'll probably need extra
> synchronization for such old hosts.
>
>
>

Yes, and the virtqueue_enable_cb_prepare() in virtnet_poll_tx() needs to
be synced with virtqueue_enabled_cb_dealyed(). Otherwise an old idx will
be published and we may still see tx interrupt storm.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Jason Wang
On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
>> From: Jason Wang 
>> Date: Sat, 11 Oct 2014 15:16:43 +0800
>>
>>> We free old transmitted packets in ndo_start_xmit() currently, so any
>>> packet must be orphaned also there. This was used to reduce the overhead of
>>> tx interrupt to achieve better performance. But this may not work for some
>>> protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
>>> implement various optimization for small packets stream such as TCP small
>>> queue and auto corking. But orphaning packets early in ndo_start_xmit()
>>> disable such things more or less since sk_wmem_alloc was not accurate. This
>>> lead extra low throughput for TCP stream of small writes.
>>>
>>> This series tries to solve this issue by enable tx interrupts for all TCP
>>> packets other than the ones with push bit or pure ACK. This is done through
>>> the support of urgent descriptor which can force an interrupt for a
>>> specified packet. If tx interrupt was enabled for a packet, there's no need
>>> to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
>>> by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
>>> can batch more for small write. More larger skb was produced by TCP in this
>>> case to improve both throughput and cpu utilization.
>>>
>>> Test shows great improvements on small write tcp streams. For most of the
>>> other cases, the throughput and cpu utilization are the same in the
>>> past. Only few cases, more cpu utilization was noticed which needs more
>>> investigation.
>>>
>>> Review and comments are welcomed.
>> I think proper accounting and queueing (at all levels, not just TCP
>> sockets) is more important than trying to skim a bunch of cycles by
>> avoiding TX interrupts.
>>
>> Having an event to free the SKB is absolutely essential for the stack
>> to operate correctly.
>>
>> And with virtio-net you don't even have the excuse of "the HW
>> unfortunately doesn't have an appropriate TX event."
>>
>> So please don't play games, and instead use TX interrupts all the
>> time.  You can mitigate them in various ways, but don't turn them on
>> selectively based upon traffic type, that's terrible.
> This got me thinking: how about using virtqueue_enable_cb_delayed
> for this mitigation?

Should work, another possible solution is interrupt coalescing, which
can speed up also the case without event index.
> It's pretty easy to implement - I'll send a proof of concept patch
> separately.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
> From: Jason Wang 
> Date: Sat, 11 Oct 2014 15:16:43 +0800
> 
> > We free old transmitted packets in ndo_start_xmit() currently, so any
> > packet must be orphaned also there. This was used to reduce the overhead of
> > tx interrupt to achieve better performance. But this may not work for some
> > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
> > implement various optimization for small packets stream such as TCP small
> > queue and auto corking. But orphaning packets early in ndo_start_xmit()
> > disable such things more or less since sk_wmem_alloc was not accurate. This
> > lead extra low throughput for TCP stream of small writes.
> > 
> > This series tries to solve this issue by enable tx interrupts for all TCP
> > packets other than the ones with push bit or pure ACK. This is done through
> > the support of urgent descriptor which can force an interrupt for a
> > specified packet. If tx interrupt was enabled for a packet, there's no need
> > to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
> > by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
> > can batch more for small write. More larger skb was produced by TCP in this
> > case to improve both throughput and cpu utilization.
> > 
> > Test shows great improvements on small write tcp streams. For most of the
> > other cases, the throughput and cpu utilization are the same in the
> > past. Only few cases, more cpu utilization was noticed which needs more
> > investigation.
> > 
> > Review and comments are welcomed.
> 
> I think proper accounting and queueing (at all levels, not just TCP
> sockets) is more important than trying to skim a bunch of cycles by
> avoiding TX interrupts.
> 
> Having an event to free the SKB is absolutely essential for the stack
> to operate correctly.
> 
> And with virtio-net you don't even have the excuse of "the HW
> unfortunately doesn't have an appropriate TX event."
> 
> So please don't play games, and instead use TX interrupts all the
> time.  You can mitigate them in various ways, but don't turn them on
> selectively based upon traffic type, that's terrible.
> 
> You can even use ->xmit_more to defer the TX interrupt indication to
> the final TX packet in the chain.

I guess we can just defer the kick, interrupt will naturally be
deferred as well.
This should solve the problem for old hosts as well.

We'll also need to implement bql for this.
Something like the below?
Completely untested - posting here to see if I figured the
API out correctly. Has to be applied on top of the previous patch.

---

virtio_net: bql + xmit_more

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 62c059d..c245047 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -213,13 +213,15 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+static int free_old_xmit_skbs(struct netdev_queue *txq,
+ struct send_queue *sq, int budget)
 {
struct sk_buff *skb;
unsigned int len;
struct virtnet_info *vi = sq->vq->vdev->priv;
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
int sent = 0;
+   unsigned int bytes = 0;
 
while (sent < budget &&
   (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
@@ -227,6 +229,7 @@ static int free_old_xmit_skbs(struct send_queue *sq, int 
budget)
 
u64_stats_update_begin(&stats->tx_syncp);
stats->tx_bytes += skb->len;
+   bytes += skb->len;
stats->tx_packets++;
u64_stats_update_end(&stats->tx_syncp);
 
@@ -234,6 +237,8 @@ static int free_old_xmit_skbs(struct send_queue *sq, int 
budget)
sent++;
}
 
+   netdev_tx_completed_queue(txq, sent, bytes);
+
return sent;
 }
 
@@ -802,7 +807,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
 again:
__netif_tx_lock(txq, smp_processor_id());
virtqueue_disable_cb(sq->vq);
-   sent += free_old_xmit_skbs(sq, budget - sent);
+   sent += free_old_xmit_skbs(txq, sq, budget - sent);
 
if (sent < budget) {
r = virtqueue_enable_cb_prepare(sq->vq);
@@ -951,6 +956,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
int qnum = skb_get_queue_mapping(skb);
struct send_queue *sq = &vi->sq[qnum];
int err, qsize = virtqueue_get_vring_size(sq->vq);
+   struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
+   bool kick = !skb->xmit_more || netif_xmit_stopped(txq);
+   unsigned int bytes = skb->len;
 
virtqueue_disable_cb(sq->vq);
 
@@ -967,7 +975,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *de

Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
> From: Jason Wang 
> Date: Sat, 11 Oct 2014 15:16:43 +0800
> 
> > We free old transmitted packets in ndo_start_xmit() currently, so any
> > packet must be orphaned also there. This was used to reduce the overhead of
> > tx interrupt to achieve better performance. But this may not work for some
> > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
> > implement various optimization for small packets stream such as TCP small
> > queue and auto corking. But orphaning packets early in ndo_start_xmit()
> > disable such things more or less since sk_wmem_alloc was not accurate. This
> > lead extra low throughput for TCP stream of small writes.
> > 
> > This series tries to solve this issue by enable tx interrupts for all TCP
> > packets other than the ones with push bit or pure ACK. This is done through
> > the support of urgent descriptor which can force an interrupt for a
> > specified packet. If tx interrupt was enabled for a packet, there's no need
> > to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
> > by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
> > can batch more for small write. More larger skb was produced by TCP in this
> > case to improve both throughput and cpu utilization.
> > 
> > Test shows great improvements on small write tcp streams. For most of the
> > other cases, the throughput and cpu utilization are the same in the
> > past. Only few cases, more cpu utilization was noticed which needs more
> > investigation.
> > 
> > Review and comments are welcomed.
> 
> I think proper accounting and queueing (at all levels, not just TCP
> sockets) is more important than trying to skim a bunch of cycles by
> avoiding TX interrupts.
> 
> Having an event to free the SKB is absolutely essential for the stack
> to operate correctly.
> 
> And with virtio-net you don't even have the excuse of "the HW
> unfortunately doesn't have an appropriate TX event."
> 
> So please don't play games, and instead use TX interrupts all the
> time.  You can mitigate them in various ways, but don't turn them on
> selectively based upon traffic type, that's terrible.

This got me thinking: how about using virtqueue_enable_cb_delayed
for this mitigation?

It's pretty easy to implement - I'll send a proof of concept patch
separately.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 3/3] virtio-net: conditionally enable tx interrupt

2014-10-14 Thread Michael S. Tsirkin
On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote:
> We free transmitted packets in ndo_start_xmit() in the past to get better
> performance in the past. One side effect is that skb_orphan() needs to be
> called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in
> fact. For TCP protocol, this means several optimization could not work well
> such as TCP small queue and auto corking. This can lead extra low
> throughput of small packets stream.
> 
> Thanks to the urgent descriptor support. This patch tries to solve this
> issue by enable the tx interrupt selectively for stream packets. This means
> we don't need to orphan TCP stream packets in ndo_start_xmit() but enable
> tx interrupt for those packets. After we get tx interrupt, a tx napi was
> scheduled to free those packets.
> 
> With this method, sk_wmem_alloc of TCP socket were more accurate than in
> the past which let TCP can batch more through TSQ and auto corking.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/virtio_net.c | 164 
> ---
>  1 file changed, 128 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5810841..b450fc4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>  
>   /* Name of the send queue: output.$index */
>   char name[40];
> +
> + struct napi_struct napi;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue 
> *rq, gfp_t gfp_mask)
>   return p;
>  }
>  
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> + struct sk_buff *skb;
> + unsigned int len;
> + struct virtnet_info *vi = sq->vq->vdev->priv;
> + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> + int sent = 0;
> +
> + while (sent < budget &&
> +(skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> + pr_debug("Sent skb %p\n", skb);
> +
> + u64_stats_update_begin(&stats->tx_syncp);
> + stats->tx_bytes += skb->len;
> + stats->tx_packets++;
> + u64_stats_update_end(&stats->tx_syncp);
> +
> + dev_kfree_skb_any(skb);
> + sent++;
> + }
> +
> + return sent;
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>   struct virtnet_info *vi = vq->vdev->priv;
> + struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> - /* Suppress further interrupts. */
> - virtqueue_disable_cb(vq);
> -
> - /* We were probably waiting for more output buffers. */
> - netif_wake_subqueue(vi->dev, vq2txq(vq));
> + if (napi_schedule_prep(&sq->napi)) {
> + virtqueue_disable_cb(vq);
> + virtqueue_disable_cb_urgent(vq);

This disable_cb is no longer safe in xmit_done callback,
since queue can be running at the same time.

You must do it under tx lock. And yes, this likely will not work
work well without event_idx. We'll probably need extra
synchronization for such old hosts.



> + __napi_schedule(&sq->napi);
> + }
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -772,7 +799,38 @@ again:
>   return received;
>  }
>  
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> + struct send_queue *sq =
> + container_of(napi, struct send_queue, napi);
> + struct virtnet_info *vi = sq->vq->vdev->priv;
> + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> + unsigned int r, sent = 0;
> +
> +again:
> + __netif_tx_lock(txq, smp_processor_id());
> + sent += free_old_xmit_skbs(sq, budget - sent);
> +
> + if (sent < budget) {
> + r = virtqueue_enable_cb_prepare_urgent(sq->vq);
> + napi_complete(napi);
> + __netif_tx_unlock(txq);
> + if (unlikely(virtqueue_poll(sq->vq, r)) &&
> + napi_schedule_prep(napi)) {
> + virtqueue_disable_cb_urgent(sq->vq);
> + __napi_schedule(napi);
> + goto again;
> + }
> + } else {
> + __netif_tx_unlock(txq);
> + }
> +
> + netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> + return sent;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +
>  /* must be called with local_bh_disable()d */
>  static int virtnet_busy_poll(struct napi_struct *napi)
>  {
> @@ -820,31 +878,13 @@ static int virtnet_open(struct net_device *dev)
>   if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>   schedule_delayed_work(&vi->refill, 0);
>   virtnet_napi_enable(&vi->rq[i]);
> + napi_enable(&vi->sq[i].napi);
>   }
>  
>   return 0;
>  }
>  
> -static void free_old_xmit_skbs(struct send_queue *sq)
> -{
> - struct sk_buff *skb;
> - unsigned int

Re: [Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo

2014-10-14 Thread Erik Rull

Hi Jan,

I still need assistance on my problem. Please have a look at the trace 
files and send me some hints to look for in the next step.


Thanks.

Best regards,

Erik


Erik Rull wrote:


Hi all,

I'm still stuck at the same point - I would like to proceed my work but I
need assistance by getting an evaluation result on the trace files I posted.
It's getting a bit more time critical now on my side, because updates for ~
100 systems worldwide have to be rolled out within the next weeks...

Thanks a lot.

Best regards,

Erik


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-14 Thread David Miller
From: Jason Wang 
Date: Sat, 11 Oct 2014 15:16:43 +0800

> We free old transmitted packets in ndo_start_xmit() currently, so any
> packet must be orphaned also there. This was used to reduce the overhead of
> tx interrupt to achieve better performance. But this may not work for some
> protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc to
> implement various optimization for small packets stream such as TCP small
> queue and auto corking. But orphaning packets early in ndo_start_xmit()
> disable such things more or less since sk_wmem_alloc was not accurate. This
> lead extra low throughput for TCP stream of small writes.
> 
> This series tries to solve this issue by enable tx interrupts for all TCP
> packets other than the ones with push bit or pure ACK. This is done through
> the support of urgent descriptor which can force an interrupt for a
> specified packet. If tx interrupt was enabled for a packet, there's no need
> to orphan it in ndo_start_xmit(), we can free it tx napi which is scheduled
> by tx interrupt. Then sk_wmem_alloc was more accurate than before and TCP
> can batch more for small write. More larger skb was produced by TCP in this
> case to improve both throughput and cpu utilization.
> 
> Test shows great improvements on small write tcp streams. For most of the
> other cases, the throughput and cpu utilization are the same in the
> past. Only few cases, more cpu utilization was noticed which needs more
> investigation.
> 
> Review and comments are welcomed.

I think proper accounting and queueing (at all levels, not just TCP
sockets) is more important than trying to skim a bunch of cycles by
avoiding TX interrupts.

Having an event to free the SKB is absolutely essential for the stack
to operate correctly.

And with virtio-net you don't even have the excuse of "the HW
unfortunately doesn't have an appropriate TX event."

So please don't play games, and instead use TX interrupts all the
time.  You can mitigate them in various ways, but don't turn them on
selectively based upon traffic type, that's terrible.

You can even use ->xmit_more to defer the TX interrupt indication to
the final TX packet in the chain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested VMX guest tsc_deadline OOPS

2014-10-14 Thread Andy Lutomirski
On Oct 14, 2014 3:46 AM, "Bandan Das"  wrote:
>
> Andy Lutomirski  writes:
>
> > I don't know what's going on here, but a nested VMX boot using -cpu
> > host on SNB on L0 and L1 fails with the OOPS below.  I kind of suspect
> > that there's both a KVM bug and an APIC driver bug here.
>
> What kernel version is this ? What are you running for L1 and L2 ?

L0 is 3.16.3-200.fc20.x86_64.  L1 and L2 are both 3.17.

Here's a set of steps that will reproduce the issue:

0. Enable kvm_intel.nested=Y.  Make sure you have Python 3.3 or newer
and QEMU 1.6 or newer installed.

1. Clone virtme from git://git.kernel.org/pub/scm/utils/kernel/virtme/virtme.git

2. cd to the kernel tree that you want to test.  Build with the attached config

3. path/to/virtme/virtme-run --pwd --kimg arch/x86/boot/bzImage
--qemu-opts -m 1024

4. (inside virtme) path/to/virtme/virtme-run --pwd --kimg
arch/x86/boot/bzImage --qemu-opts -m 512

5. Read the OOPS :)

7. Press Ctrl-a x to exit.

Note that, when typing that fourth command, you won't have HOME set,
so you might need to use an absolute path.

If you're testing on Fedora, you can replace step 1 with 'dnf install
virtme' :)  (But, if you do that, you'll have to drop the '--pwd' and
fix the path to the nested --kimg, since the version of virtme in
Fedora is a little bit behind.)

--Andy

>
> > -cpu host,-tsc-deadline on L1 works around the OOPS.
> >
> > [0.184000] Freeing SMP alternatives memory: 32K (81ff
> > - 81ff8000)
> > [0.188000] [ cut here ]
> > [0.189000] WARNING: CPU: 0 PID: 1 at
> > arch/x86/kernel/apic/apic.c:1393 setup_local_APIC+0x26c/0x320()
> > [0.19] Modules linked in:
> > [0.192000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #459
> > [0.193000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [0.194000]  0009 8800070a3e40 818588a7
> > 
> > [0.199000]  8800070a3e78 8108db68 
> > 00f0
> > [0.205000]    0001
> > 8800070a3e88
> > [0.21] Call Trace:
> > [0.211000]  [] dump_stack+0x45/0x56
> > [0.212000]  [] warn_slowpath_common+0x78/0xa0
> > [0.213000]  [] warn_slowpath_null+0x15/0x20
> > [0.214000]  [] setup_local_APIC+0x26c/0x320
> > [0.215000]  [] native_smp_prepare_cpus+0x255/0x312
> > [0.216000]  [] kernel_init_freeable+0x94/0x1d1
> > [0.217000]  [] ? rest_init+0x80/0x80
> > [0.218000]  [] kernel_init+0x9/0xf0
> > [0.219000]  [] ret_from_fork+0x7c/0xb0
> > [0.22]  [] ? rest_init+0x80/0x80
> > [0.221000] ---[ end trace f5f19be5716e445c ]---
> > [0.234000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > [0.244000] smpboot: CPU0: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
> > (fam: 06, model: 2d, stepping: 06)
> > [0.248000] divide error:  [#1] SMP
> > [0.248000] Modules linked in:
> > [0.248000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
> > 3.17.0+ #459
> > [0.248000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [0.248000] task: 880007098000 ti: 8800070a task.ti:
> > 8800070a
> > [0.248000] RIP: 0010:[]  []
> > clockevents_config.part.3+0x1f/0xa0
> > [0.248000] RSP: :8800070a3e58  EFLAGS: 00010246
> > [0.248000] RAX:  RBX: 880007c0cd00 RCX: 
> > 
> > [0.248000] RDX:  RSI:  RDI: 
> > 
> > [0.248000] RBP: 8800070a3e70 R08: 0001 R09: 
> > 82036b90
> > [0.248000] R10: 00a7 R11: 00a6 R12: 
> > b018
> > [0.248000] R13: 0040 R14: b020 R15: 
> > 
> > [0.248000] FS:  () GS:880007c0()
> > knlGS:
> > [0.248000] CS:  0010 DS:  ES:  CR0: 80050033
> > [0.248000] CR2: 8800020dd000 CR3: 01e11000 CR4: 
> > 000406f0
> > [0.248000] Stack:
> > [0.248000]  880007c0cd00 b018 0040
> > 8800070a3e88
> > [0.248000]  810ebbcb 880007c0cd00 8800070a3e98
> > 8107646f
> > [0.248000]  8800070a3ed8 81f086d0 0053
> > 
> > [0.248000] Call Trace:
> > [0.248000]  [] 
> > clockevents_config_and_register+0x1b/0x30
> > [0.248000]  [] setup_APIC_timer+0x10f/0x170
> > [0.248000]  [] setup_boot_APIC_clock+0x4ae/0x4ba
> > [0.248000]  [] native_smp_prepare_cpus+0x2ef/0x312
> > [0.248000]  [] kernel_init_freeable+0x94/0x1d1
> > [0.248000]  [] ? rest_init+0x80/0x80
> > [0.248000]  [] kernel_init+0x9/0xf0
> > [0.248000]  [] ret_from_fork+0x7c/0xb0
> > [0.248000]  [] ? rest_init+0x80/0x80
> > [0.248000] Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 31 d2 89
> > f1 89 f6 41 b8 01 00 00 00 48 89 e5 41 55 41 54

Re: [RFC PATCH] arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs

2014-10-14 Thread Victor Kamensky
On 14 October 2014 02:47, Marc Zyngier  wrote:
> On Sun, Sep 28 2014 at 03:04:26 PM, Christoffer Dall 
>  wrote:
>> The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
>> store these as an array of two such registers on the vgic vcpu struct.
>> However, we access them as a single 64-bit value or as a bitmap pointer
>> in the generic vgic code, which breaks BE support.
>>
>> Instead, store them as u64 values on the vgic structure and do the
>> word-swapping in the assembly code, which already handles the byte order
>> for BE systems.
>>
>> Signed-off-by: Christoffer Dall 
>
> (still going through my email backlog, hence the delay).
>
> This looks like a valuable fix. Haven't had a chance to try it (no BE
> setup at hand) but maybe Victor can help reproducing this?.

I'll give it a spin.

Thanks,
Victor

> Acked-by: Marc Zyngier 
>
> M.
> --
> Jazz is not dead. It just smells funny.
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested VMX guest tsc_deadline OOPS

2014-10-14 Thread Bandan Das
Andy Lutomirski  writes:

> I don't know what's going on here, but a nested VMX boot using -cpu
> host on SNB on L0 and L1 fails with the OOPS below.  I kind of suspect
> that there's both a KVM bug and an APIC driver bug here.

What kernel version is this ? What are you running for L1 and L2 ?

> -cpu host,-tsc-deadline on L1 works around the OOPS.
>
> [0.184000] Freeing SMP alternatives memory: 32K (81ff
> - 81ff8000)
> [0.188000] [ cut here ]
> [0.189000] WARNING: CPU: 0 PID: 1 at
> arch/x86/kernel/apic/apic.c:1393 setup_local_APIC+0x26c/0x320()
> [0.19] Modules linked in:
> [0.192000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #459
> [0.193000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [0.194000]  0009 8800070a3e40 818588a7
> 
> [0.199000]  8800070a3e78 8108db68 
> 00f0
> [0.205000]    0001
> 8800070a3e88
> [0.21] Call Trace:
> [0.211000]  [] dump_stack+0x45/0x56
> [0.212000]  [] warn_slowpath_common+0x78/0xa0
> [0.213000]  [] warn_slowpath_null+0x15/0x20
> [0.214000]  [] setup_local_APIC+0x26c/0x320
> [0.215000]  [] native_smp_prepare_cpus+0x255/0x312
> [0.216000]  [] kernel_init_freeable+0x94/0x1d1
> [0.217000]  [] ? rest_init+0x80/0x80
> [0.218000]  [] kernel_init+0x9/0xf0
> [0.219000]  [] ret_from_fork+0x7c/0xb0
> [0.22]  [] ? rest_init+0x80/0x80
> [0.221000] ---[ end trace f5f19be5716e445c ]---
> [0.234000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [0.244000] smpboot: CPU0: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
> (fam: 06, model: 2d, stepping: 06)
> [0.248000] divide error:  [#1] SMP
> [0.248000] Modules linked in:
> [0.248000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
> 3.17.0+ #459
> [0.248000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [0.248000] task: 880007098000 ti: 8800070a task.ti:
> 8800070a
> [0.248000] RIP: 0010:[]  []
> clockevents_config.part.3+0x1f/0xa0
> [0.248000] RSP: :8800070a3e58  EFLAGS: 00010246
> [0.248000] RAX:  RBX: 880007c0cd00 RCX: 
> 
> [0.248000] RDX:  RSI:  RDI: 
> 
> [0.248000] RBP: 8800070a3e70 R08: 0001 R09: 
> 82036b90
> [0.248000] R10: 00a7 R11: 00a6 R12: 
> b018
> [0.248000] R13: 0040 R14: b020 R15: 
> 
> [0.248000] FS:  () GS:880007c0()
> knlGS:
> [0.248000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.248000] CR2: 8800020dd000 CR3: 01e11000 CR4: 
> 000406f0
> [0.248000] Stack:
> [0.248000]  880007c0cd00 b018 0040
> 8800070a3e88
> [0.248000]  810ebbcb 880007c0cd00 8800070a3e98
> 8107646f
> [0.248000]  8800070a3ed8 81f086d0 0053
> 
> [0.248000] Call Trace:
> [0.248000]  [] clockevents_config_and_register+0x1b/0x30
> [0.248000]  [] setup_APIC_timer+0x10f/0x170
> [0.248000]  [] setup_boot_APIC_clock+0x4ae/0x4ba
> [0.248000]  [] native_smp_prepare_cpus+0x2ef/0x312
> [0.248000]  [] kernel_init_freeable+0x94/0x1d1
> [0.248000]  [] ? rest_init+0x80/0x80
> [0.248000]  [] kernel_init+0x9/0xf0
> [0.248000]  [] ret_from_fork+0x7c/0xb0
> [0.248000]  [] ? rest_init+0x80/0x80
> [0.248000] Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 31 d2 89
> f1 89 f6 41 b8 01 00 00 00 48 89 e5 41 55 41 54 53 48 89 fb 48 8b 7f
> 70 48 89 f8 <48> f7 f6 48 85 c0 74 0b 48 3d 58 02 00 00 41 89 c0 77 4e
> 4c 8d
> [0.248000] RIP  [] clockevents_config.part.3+0x1f/0xa0
> [0.248000]  RSP 
> [0.249000] ---[ end trace f5f19be5716e445d ]---
> [0.25] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x000b
> [0.25]
> [0.25] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x000b
> [0.25]
>
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs

2014-10-14 Thread Marc Zyngier
On Sun, Sep 28 2014 at 03:04:26 PM, Christoffer Dall 
 wrote:
> The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
> store these as an array of two such registers on the vgic vcpu struct.
> However, we access them as a single 64-bit value or as a bitmap pointer
> in the generic vgic code, which breaks BE support.
>
> Instead, store them as u64 values on the vgic structure and do the
> word-swapping in the assembly code, which already handles the byte order
> for BE systems.
>
> Signed-off-by: Christoffer Dall 

(still going through my email backlog, hence the delay).

This looks like a valuable fix. Haven't had a chance to try it (no BE
setup at hand) but maybe Victor can help reproducing this?.

Acked-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 04/25] virtio: defer config changed notifications

2014-10-14 Thread Michael S. Tsirkin
On Tue, Oct 14, 2014 at 11:01:12AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > Defer config changed notifications that arrive during
> > probe/scan/freeze/restore.
> >
> > This will allow drivers to set DRIVER_OK earlier, without worrying about
> > racing with config change interrupts.
> >
> > This change will also benefit old hypervisors (before 2009)
> > that send interrupts without checking DRIVER_OK: previously,
> > the callback could race with driver-specific initialization.
> >
> > This will also help simplify drivers.
> 
> But AFAICT you never *read* dev->config_changed.
> 
> You unconditionally trigger a config_changed event in
> virtio_config_enable().  That's a bit weird, but probably OK.
> 
> How about the following change (on top of your patch).  I
> think the renaming is clearer, and note the added if() test in
> virtio_config_enable().
> 
> If you approve, I'll fold it in.
> 
> Cheers,
> Rusty.

Hi Rusty,
I'm okay with both changes.
You can fold it in if you prefer, or just make it a patch on top.

> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 2536701b098b..df598dd8c5c8 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device 
> *dev)
>   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>  
>   if (!dev->config_enabled)
> - dev->config_changed = true;
> + dev->config_change_pending = true;
>   else if (drv && drv->config_changed)
>   drv->config_changed(dev);
>  }
> @@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device 
> *dev)
>  {
>   spin_lock_irq(&dev->config_lock);
>   dev->config_enabled = true;
> - __virtio_config_changed(dev);
> - dev->config_changed = false;
> + if (dev->config_change_pending)
> + __virtio_config_changed(dev);
> + dev->config_change_pending = false;
>   spin_unlock_irq(&dev->config_lock);
>  }
>  
> @@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   spin_lock_init(&dev->config_lock);
>   dev->config_enabled = false;
> - dev->config_changed = false;
> + dev->config_change_pending = false;
>  
>   /* We always start by resetting the device, in case a previous
>* driver messed it up.  This also tests that code path a little. */
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 5636b119dc25..65261a7244fc 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
>   * @index: unique position on the virtio bus
>   * @failed: saved value for CONFIG_S_FAILED bit (for restore)
>   * @config_enabled: configuration change reporting enabled
> - * @config_changed: configuration change reported while disabled
> + * @config_change_pending: configuration change reported while disabled
>   * @config_lock: protects configuration change reporting
>   * @dev: underlying device.
>   * @id: the device type identification (used to match it with a driver).
> @@ -94,7 +94,7 @@ struct virtio_device {
>   int index;
>   bool failed;
>   bool config_enabled;
> - bool config_changed;
> + bool config_change_pending;
>   spinlock_t config_lock;
>   struct device dev;
>   struct virtio_device_id id;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html