Re: [PATCH v3 2/2] virtio_balloon: replace oom notifier with shrinker

2018-08-05 Thread Wei Wang

On 08/04/2018 03:15 AM, Michael S. Tsirkin wrote:

On Fri, Aug 03, 2018 at 04:32:26PM +0800, Wei Wang wrote:

The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
   generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
   Drivers with large amuont of reclaimable memory is expected to
   release them at an early stage of memory pressure;
- The notifier callback isn't aware of oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314

This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. The balloon pages are
given back to mm adaptively by returning the number of pages that the
reclaimer is asking for (i.e. sc->nr_to_scan).

Currently the max possible value of sc->nr_to_scan passed to the balloon
shrinker is SHRINK_BATCH, which is 128. This is smaller than the
limitation that only VIRTIO_BALLOON_ARRAY_PFNS_MAX (256) pages can be
returned via one invocation of leak_balloon. But this patch still
considers the case that SHRINK_BATCH or shrinker->batch could be changed
to a value larger than VIRTIO_BALLOON_ARRAY_PFNS_MAX, which will need to
do multiple invocations of leak_balloon.

Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been used
to release balloon pages on OOM. We continue to use this feature bit for
the shrinker, so the shrinker is only registered when this feature bit
has been negotiated with host.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 


Could you add data at how was this tested and how did guest
behaviour change. Which configurations see an improvement?



Yes. Please see the differences from the "*1" and "*2" cases below.

Taking this chance, I use "*2" and "*3" to show Michal etc the 
differences of applying and not applying the shrinker fix patch here: 
https://lkml.org/lkml/2018/8/3/384



*1. V3 patches
1)After inflating some amount of memory, actual=101536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757289 514  10 171 447
Swap: 10236   0   10236

2) dd if=478MB_file of=/dev/null, actual=1058721792 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757233 102  10 639 475
Swap: 10236   0   10236

The advantage is that the inflated pages are given back to mm based on 
the number, i.e. ~56MB(diff "actual" above) of the reclaimer is asking 
for. This is more adaptive.




*2. V2 paches, balloon_pages_to_shrink=100 pages (around 4GB), with 
the shrinker fix patches applied.

1)After inflating some amount of memory, actual=101536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   79757288 530  10 157 455
Swap: 10236   0   10236

2)dd if=478MB_file of=/dev/null, actual=5096001536 Bytes
free -m
  totalusedfree  shared buff/cache   
available

Mem:   797533813953  10 6404327
Swap: 10236   0   10236

In the above example, we set 4GB to shrink to make the difference 
obvious. Though the claimer only needs to reclaim ~56MB memory, 4GB 
inflated pages are given back to mm, which is unnecessary. From the 
user's perspective, it has no idea of how many pages to given back at 
the time of setting the module parameter (balloon_pages_to_shrink). So I 
think the above "*1" is better.




*3.  V2 paches, balloon_pages_to_shrink=100 pages (around 4GB), 
without the shrinker fix patches applied.

1) After inflating some amount of memory, actual=101536 Bytes
free -m
   totalusedfree  shared buff/cache   
available

Mem:   79757292 524  10 158 450
Swap: 10236   0   10236

2) dd if=478MB_file of=/dev/null, actual=8589934592 Bytes
free -m
 totalusedfree  shared  buff/cache 
available

Mem:   7975  537281  10 6407656
Swap: 10236   0   10236

Compared to *2, all the balloon pages are shrunk, but users expect 4GB 
to shrink. The reason is that do_slab_shrink has a mistake in 
calculating schrinkctl->nr_scanned, which should be the actual number of 
pages that the shrinker has freed, but do slab_shrink still treat that 
value as 128 (but 4GB has actually been freed).



Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next V2] vhost: switch to use new message format

2018-08-05 Thread Jason Wang
We use to have message like:

struct vhost_msg {
int type;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};

Unfortunately, there will be a hole of 32bit in 64bit machine because
of the alignment. This leads a different formats between 32bit API and
64bit API. What's more it will break 32bit program running on 64bit
machine.

So fixing this by introducing a new message type with an explicit
32bit reserved field after type like:

struct vhost_msg_v2 {
__u32 type;
__u32 reserved;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};

We will have a consistent ABI after switching to use this. To enable
this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for
userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2).

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang 
---
Changes from V1:
- use __u32 instead of int for type
---
 drivers/vhost/net.c| 30 
 drivers/vhost/vhost.c  | 71 ++
 drivers/vhost/vhost.h  | 11 ++-
 include/uapi/linux/vhost.h | 18 
 4 files changed, 111 insertions(+), 19 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 367d802..4e656f8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -78,6 +78,10 @@ enum {
 };
 
 enum {
+   VHOST_NET_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
+};
+
+enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
VHOST_NET_VQ_MAX = 2,
@@ -1399,6 +1403,21 @@ static long vhost_net_reset_owner(struct vhost_net *n)
return err;
 }
 
+static int vhost_net_set_backend_features(struct vhost_net *n, u64 features)
+{
+   int i;
+
+   mutex_lock(>dev.mutex);
+   for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+   mutex_lock(>vqs[i].vq.mutex);
+   n->vqs[i].vq.acked_backend_features = features;
+   mutex_unlock(>vqs[i].vq.mutex);
+   }
+   mutex_unlock(>dev.mutex);
+
+   return 0;
+}
+
 static int vhost_net_set_features(struct vhost_net *n, u64 features)
 {
size_t vhost_hlen, sock_hlen, hdr_len;
@@ -1489,6 +1508,17 @@ static long vhost_net_ioctl(struct file *f, unsigned int 
ioctl,
if (features & ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
+   case VHOST_GET_BACKEND_FEATURES:
+   features = VHOST_NET_BACKEND_FEATURES;
+   if (copy_to_user(featurep, , sizeof(features)))
+   return -EFAULT;
+   return 0;
+   case VHOST_SET_BACKEND_FEATURES:
+   if (copy_from_user(, featurep, sizeof(features)))
+   return -EFAULT;
+   if (features & ~VHOST_NET_BACKEND_FEATURES)
+   return -EOPNOTSUPP;
+   return vhost_net_set_backend_features(n, features);
case VHOST_RESET_OWNER:
return vhost_net_reset_owner(n);
case VHOST_SET_OWNER:
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a502f1a..6f6c42d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -315,6 +315,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->log_addr = -1ull;
vq->private_data = NULL;
vq->acked_features = 0;
+   vq->acked_backend_features = 0;
vq->log_base = NULL;
vq->error_ctx = NULL;
vq->kick = NULL;
@@ -1027,28 +1028,40 @@ static int vhost_process_iotlb_msg(struct vhost_dev 
*dev,
 ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
 struct iov_iter *from)
 {
-   struct vhost_msg_node node;
-   unsigned size = sizeof(struct vhost_msg);
-   size_t ret;
-   int err;
+   struct vhost_iotlb_msg msg;
+   size_t offset;
+   int type, ret;
 
-   if (iov_iter_count(from) < size)
-   return 0;
-   ret = copy_from_iter(, size, from);
-   if (ret != size)
+   ret = copy_from_iter(, sizeof(type), from);
+   if (ret != sizeof(type))
goto done;
 
-   switch (node.msg.type) {
+   switch (type) {
case VHOST_IOTLB_MSG:
-   err = vhost_process_iotlb_msg(dev, );
-   if (err)
-   ret = err;
+   /* There maybe a hole after type for V1 message type,
+* so skip it here.
+*/
+   offset = offsetof(struct vhost_msg, iotlb) - sizeof(int);
+   break;
+   case VHOST_IOTLB_MSG_V2:
+   offset = sizeof(__u32);
break;
default:
ret = -EINVAL;
-   break;
+   goto done;
+   }
+
+   iov_iter_advance(from, offset);
+   ret = copy_from_iter(, sizeof(msg), from);
+   if (ret 

Re: [PATCH net-next] vhost: switch to use new message format

2018-08-05 Thread Jason Wang



On 2018年08月03日 15:59, Michael S. Tsirkin wrote:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a502f1a..6f6c42d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -315,6 +315,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->log_addr = -1ull;
vq->private_data = NULL;
vq->acked_features = 0;
+   vq->acked_backend_features = 0;
vq->log_base = NULL;
vq->error_ctx = NULL;
vq->kick = NULL;
@@ -1027,28 +1028,40 @@ static int vhost_process_iotlb_msg(struct vhost_dev 
*dev,
  ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
 struct iov_iter *from)
  {
-   struct vhost_msg_node node;
-   unsigned size = sizeof(struct vhost_msg);
-   size_t ret;
-   int err;
+   struct vhost_iotlb_msg msg;
+   size_t offset;
+   int type, ret;
  
-	if (iov_iter_count(from) < size)

-   return 0;
-   ret = copy_from_iter(, size, from);
-   if (ret != size)
+   ret = copy_from_iter(, sizeof(type), from);
+   if (ret != sizeof(type))
goto done;
  
-	switch (node.msg.type) {

+   switch (type) {
case VHOST_IOTLB_MSG:
-   err = vhost_process_iotlb_msg(dev, );
-   if (err)
-   ret = err;
+   /* There maybe a hole after type for V1 message type,
+* so skip it here.
+*/
+   offset = offsetof(struct vhost_msg, iotlb) - sizeof(int);
+   break;
+   case VHOST_IOTLB_MSG_V2:
+   offset = sizeof(__u32);
break;
default:
ret = -EINVAL;
-   break;
+   goto done;
+   }
+
+   iov_iter_advance(from, offset);
+   ret = copy_from_iter(, sizeof(msg), from);
+   if (ret != sizeof(msg))
+   goto done;
+   if (vhost_process_iotlb_msg(dev, )) {
+   ret = -EFAULT;
+   goto done;
}
  
+	ret = (type == VHOST_IOTLB_MSG) ? sizeof(struct vhost_msg) :

+ sizeof(struct vhost_msg_v2);
  done:
return ret;
  }

We can actually fix 32 bit apps too, checking the mode for v1.
But that can wait for another patch.



Yes, let me do it on top.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] vhost: switch to use new message format

2018-08-05 Thread Jason Wang



On 2018年08月05日 04:21, David Miller wrote:

From: Jason Wang 
Date: Fri,  3 Aug 2018 15:04:51 +0800


So fixing this by introducing a new message type with an explicit
32bit reserved field after type like:

struct vhost_msg_v2 {
int type;
__u32 reserved;

Please use fixed sized types consistently.  Use 's32' instead of 'int'
here.

Thanks!


Ok, V2 will be posted soon.

And it looks to me u32 is sufficient.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] virtio-net: mark expected switch fall-throughs

2018-08-05 Thread David Miller
From: "Gustavo A. R. Silva" 
Date: Sat, 4 Aug 2018 21:42:05 -0500

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 1402059 ("Missing break in switch")
> Addresses-Coverity-ID: 1402060 ("Missing break in switch")
> Addresses-Coverity-ID: 1402061 ("Missing break in switch")
> Signed-off-by: Gustavo A. R. Silva 

Applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-05 Thread Benjamin Herrenschmidt
On Mon, 2018-08-06 at 07:16 +1000, Benjamin Herrenschmidt wrote:
> I'm trying to understand because the limitation is not a device side
> limitation, it's not a qemu limitation, it's actually more of a VM
> limitation. It has most of its memory pages made inaccessible for
> security reasons. The platform from a qemu/KVM perspective is almost
> entirely normal.

In fact this is probably the best image of what's going on:

It's a normal VM from a KVM/qemu perspective (and thus virtio). It
boots normally, can run firmware, linux, etc... normally, it's not
created with any different XML or qemu command line definition etc...

It just that once it reaches the kernel with the secure stuff enabled
(could be via kexec from a normal kernel), that kernel will "stash
away" most of the VM's memory into some secure space that nothing else
(not even the hypervisor) can access.

It can keep around a pool or two of normal memory for bounce buferring
IOs but that's about it.

I think that's the clearest way I could find to explain what's going
on, and why I'm so resistant on adding things on qemu side.

That said, we *can* (and will) notify KVM and qemu of the transition,
and we can/will do so after virtio has been instanciated and used by
the bootloader, but before it will be used (or even probed) by the
secure VM itself, so there's an opportunity to poke at things, either
from the VM itself (a quirk poking at virtio config space for example)
or from qemu (though I find the idea of iterating all virtio devices
from qemu to change a setting rather gross).

Cheers,
Ben.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-05 Thread Benjamin Herrenschmidt
On Sun, 2018-08-05 at 00:29 -0700, Christoph Hellwig wrote:
> On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote:
> >  - One you have rejected, which is to have a way for "no-iommu" virtio
> > (which still doesn't use an iommu on the qemu side and doesn't need
> > to), to be forced to use some custom DMA ops on the VM side.
> > 
> >  - One, which sadly has more overhead and will require modifying more
> > pieces of the puzzle, which is to make qemu uses an emulated iommu.
> > Once we make qemu do that, we can then layer swiotlb on top of the
> > emulated iommu on the guest side, and pass that as dma_ops to virtio.
> 
> Or number three:  have a a virtio feature bit that tells the VM
> to use whatever dma ops the platform thinks are appropinquate for
> the bus it pretends to be on.  Then set a dma-range that is limited
> to your secure memory range (if you really need it to be runtime
> enabled only after a device reset that rescans) and use the normal
> dma mapping code to bounce buffer.

Who would set this bit ? qemu ? Under what circumstances ?

What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set,
ie, what would qemu do and what would Linux do ? I'm not sure I fully
understand your idea.

I'm trying to understand because the limitation is not a device side
limitation, it's not a qemu limitation, it's actually more of a VM
limitation. It has most of its memory pages made inaccessible for
security reasons. The platform from a qemu/KVM perspective is almost
entirely normal.

So I don't understand when would qemu set this bit, or should it be set
by the VM at runtime ?

Cheers,
Ben.





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] virtio_net: Add ethtool stat items

2018-08-05 Thread Tariq Toukan




On 05/08/2018 4:45 PM, Tariq Toukan wrote:



On 05/08/2018 4:15 PM, Tariq Toukan wrote:



On 25/07/2018 10:59 PM, David Miller wrote:

From: "Michael S. Tsirkin" 
Date: Wed, 25 Jul 2018 12:40:12 +0300


On Mon, Jul 23, 2018 at 11:36:03PM +0900, Toshiaki Makita wrote:

From: Toshiaki Makita 

Add some ethtool stat items useful for performance analysis.

Signed-off-by: Toshiaki Makita 


Series:

Acked-by: Michael S. Tsirkin 


Series applied.


Patch 1 seems appropriate for stable, even though it's minor.


Ok, I'll put patch #1 also into 'net' and queue it up for -stable.

Thanks.



Hi,
Our verification team encountered the following degradation, 
introduced by this series. Please check KASAN bug report below.


We tested before and after the series, but didn't bisect within it. We 
can do if necessary.


Thanks,
Tariq



I see this might already be fixed, here:
https://marc.info/?l=linux-netdev=153335713407532=2

Verifying...



No repro when applying this fix.

Thanks,
Tariq

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] virtio_net: Add ethtool stat items

2018-08-05 Thread Tariq Toukan



On 05/08/2018 4:15 PM, Tariq Toukan wrote:



On 25/07/2018 10:59 PM, David Miller wrote:

From: "Michael S. Tsirkin" 
Date: Wed, 25 Jul 2018 12:40:12 +0300


On Mon, Jul 23, 2018 at 11:36:03PM +0900, Toshiaki Makita wrote:

From: Toshiaki Makita 

Add some ethtool stat items useful for performance analysis.

Signed-off-by: Toshiaki Makita 


Series:

Acked-by: Michael S. Tsirkin 


Series applied.


Patch 1 seems appropriate for stable, even though it's minor.


Ok, I'll put patch #1 also into 'net' and queue it up for -stable.

Thanks.



Hi,
Our verification team encountered the following degradation, introduced 
by this series. Please check KASAN bug report below.


We tested before and after the series, but didn't bisect within it. We 
can do if necessary.


Thanks,
Tariq



I see this might already be fixed, here:
https://marc.info/?l=linux-netdev=153335713407532=2

Verifying...



[   46.166544] BUG: KASAN: use-after-free in virtnet_poll+0xd9c/0xfd1 
[virtio_net]

[   46.166593] Read of size 8 at addr 8803400da608 by task ip/1013

[   46.166603] CPU: 3 PID: 1013 Comm: ip Not tainted 
4.18.0-rc6-for-upstream-dbg-2018-07-26_19-45-52-64 #1
[   46.166606] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-1ubuntu1 04/01/2014

[   46.166609] Call Trace:
[   46.166613]  
[   46.166624]  dump_stack+0xf0/0x17b
[   46.166647]  ? show_regs_print_info+0x5/0x5
[   46.166654]  ? printk+0x9c/0xc3
[   46.11]  ? kmsg_dump_rewind_nolock+0xf5/0xf5
[   46.166677]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166685]  print_address_description+0xea/0x380
[   46.166701]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166711]  kasan_report+0x1dd/0x460
[   46.166727]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166743]  virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166767]  ? receive_buf+0x2e30/0x2e30 [virtio_net]
[   46.166796]  ? sched_clock_cpu+0x18/0x2b0
[   46.166809]  ? print_irqtrace_events+0x280/0x280
[   46.166817]  ? print_irqtrace_events+0x280/0x280
[   46.166830]  ? rcu_process_callbacks+0xc5e/0x12d0
[   46.166838]  ? kvm_clock_read+0x1f/0x30
[   46.166857]  ? mark_held_locks+0xd5/0x170
[   46.166867]  ? net_rx_action+0x2aa/0x10e0
[   46.166882]  net_rx_action+0x4bc/0x10e0
[   46.166906]  ? napi_complete_done+0x480/0x480
[   46.166925]  ? print_irqtrace_events+0x280/0x280
[   46.166935]  ? sched_clock+0x5/0x10
[   46.166952]  ? __lock_is_held+0xcb/0x1a0
[   46.166982]  __do_softirq+0x2c4/0xdf4
[   46.167010]  do_softirq_own_stack+0x2a/0x40
[   46.167031]  
[   46.167038]  ? virtnet_napi_enable+0x37/0xa0 [virtio_net]
[   46.167044]  do_softirq.part.11+0x69/0x70
[   46.167065]  __local_bh_enable_ip+0x1d9/0x250
[   46.167076]  virtnet_open+0x13c/0x440 [virtio_net]
[   46.167099]  __dev_open+0x1cf/0x390
[   46.167108]  ? dev_set_rx_mode+0x30/0x30
[   46.167113]  ? __local_bh_enable_ip+0xf7/0x250
[   46.167124]  ? trace_hardirqs_on_caller+0x38d/0x6c0
[   46.167130]  ? __dev_change_flags+0x18d/0x630
[   46.167142]  __dev_change_flags+0x469/0x630
[   46.167152]  ? dev_set_allmulti+0x10/0x10
[   46.167157]  ? __lock_acquire+0x9c1/0x4b50
[   46.167166]  ? print_irqtrace_events+0x280/0x280
[   46.167174]  ? kvm_clock_read+0x1f/0x30
[   46.167191]  ? rtnetlink_put_metrics+0x530/0x530
[   46.167205]  dev_change_flags+0x8b/0x160
[   46.167236]  do_setlink+0xa17/0x39f0
[   46.167248]  ? sched_clock_cpu+0x18/0x2b0
[   46.167267]  ? rtnl_dump_ifinfo+0xd20/0xd20
[   46.167277]  ? print_irqtrace_events+0x280/0x280
[   46.167285]  ? kvm_clock_read+0x1f/0x30
[   46.167316]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167321]  ? kvm_sched_clock_read+0x5/0x10
[   46.167327]  ? sched_clock+0x5/0x10
[   46.167333]  ? sched_clock_cpu+0x18/0x2b0
[   46.167382]  ? memset+0x1f/0x40
[   46.167408]  ? nla_parse+0x8c/0x3f0
[   46.167419]  ? rtnetlink_put_metrics+0x530/0x530
[   46.167427]  ? kvm_sched_clock_read+0x5/0x10
[   46.167433]  ? sched_clock+0x5/0x10
[   46.167439]  ? sched_clock_cpu+0x18/0x2b0
[   46.167457]  rtnl_newlink+0x9dc/0x13a0
[   46.167484]  ? rtnl_link_unregister+0x2b0/0x2b0
[   46.167513]  ? kvm_clock_read+0x1f/0x30
[   46.167538]  ? print_irqtrace_events+0x280/0x280
[   46.167546]  ? kvm_clock_read+0x1f/0x30
[   46.167551]  ? kvm_sched_clock_read+0x5/0x10
[   46.167557]  ? sched_clock+0x5/0x10
[   46.167562]  ? sched_clock_cpu+0x18/0x2b0
[   46.167567]  ? kvm_clock_read+0x1f/0x30
[   46.167598]  ? __lock_acquire+0x9c1/0x4b50
[   46.167640]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167646]  ? kvm_clock_read+0x1f/0x30
[   46.167651]  ? kvm_sched_clock_read+0x5/0x10
[   46.167673]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167698]  ? is_bpf_text_address+0x87/0x130
[   46.167707]  ? print_irqtrace_events+0x280/0x280
[   46.167714]  ? kvm_clock_read+0x1f/0x30
[   46.167718]  ? kvm_sched_clock_read+0x5/0x10
[   46.167723]  ? sched_clock+0x5/0x10
[   46.167728]  ? sched_clock_cpu+0x18/0x2b0
[   46.167753]  ? get_lock_stats+0x18/0x160
[   46.167877]  ? __lock_is_held+0xcb/0x1a0
[   46.167908]  

Re: [PATCH net-next 0/6] virtio_net: Add ethtool stat items

2018-08-05 Thread Tariq Toukan




On 25/07/2018 10:59 PM, David Miller wrote:

From: "Michael S. Tsirkin" 
Date: Wed, 25 Jul 2018 12:40:12 +0300


On Mon, Jul 23, 2018 at 11:36:03PM +0900, Toshiaki Makita wrote:

From: Toshiaki Makita 

Add some ethtool stat items useful for performance analysis.

Signed-off-by: Toshiaki Makita 


Series:

Acked-by: Michael S. Tsirkin 


Series applied.


Patch 1 seems appropriate for stable, even though it's minor.


Ok, I'll put patch #1 also into 'net' and queue it up for -stable.

Thanks.



Hi,
Our verification team encountered the following degradation, introduced 
by this series. Please check KASAN bug report below.


We tested before and after the series, but didn't bisect within it. We 
can do if necessary.


Thanks,
Tariq


[   46.166544] BUG: KASAN: use-after-free in virtnet_poll+0xd9c/0xfd1 
[virtio_net]

[   46.166593] Read of size 8 at addr 8803400da608 by task ip/1013

[   46.166603] CPU: 3 PID: 1013 Comm: ip Not tainted 
4.18.0-rc6-for-upstream-dbg-2018-07-26_19-45-52-64 #1
[   46.166606] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-1ubuntu1 04/01/2014

[   46.166609] Call Trace:
[   46.166613]  
[   46.166624]  dump_stack+0xf0/0x17b
[   46.166647]  ? show_regs_print_info+0x5/0x5
[   46.166654]  ? printk+0x9c/0xc3
[   46.11]  ? kmsg_dump_rewind_nolock+0xf5/0xf5
[   46.166677]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166685]  print_address_description+0xea/0x380
[   46.166701]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166711]  kasan_report+0x1dd/0x460
[   46.166727]  ? virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166743]  virtnet_poll+0xd9c/0xfd1 [virtio_net]
[   46.166767]  ? receive_buf+0x2e30/0x2e30 [virtio_net]
[   46.166796]  ? sched_clock_cpu+0x18/0x2b0
[   46.166809]  ? print_irqtrace_events+0x280/0x280
[   46.166817]  ? print_irqtrace_events+0x280/0x280
[   46.166830]  ? rcu_process_callbacks+0xc5e/0x12d0
[   46.166838]  ? kvm_clock_read+0x1f/0x30
[   46.166857]  ? mark_held_locks+0xd5/0x170
[   46.166867]  ? net_rx_action+0x2aa/0x10e0
[   46.166882]  net_rx_action+0x4bc/0x10e0
[   46.166906]  ? napi_complete_done+0x480/0x480
[   46.166925]  ? print_irqtrace_events+0x280/0x280
[   46.166935]  ? sched_clock+0x5/0x10
[   46.166952]  ? __lock_is_held+0xcb/0x1a0
[   46.166982]  __do_softirq+0x2c4/0xdf4
[   46.167010]  do_softirq_own_stack+0x2a/0x40
[   46.167031]  
[   46.167038]  ? virtnet_napi_enable+0x37/0xa0 [virtio_net]
[   46.167044]  do_softirq.part.11+0x69/0x70
[   46.167065]  __local_bh_enable_ip+0x1d9/0x250
[   46.167076]  virtnet_open+0x13c/0x440 [virtio_net]
[   46.167099]  __dev_open+0x1cf/0x390
[   46.167108]  ? dev_set_rx_mode+0x30/0x30
[   46.167113]  ? __local_bh_enable_ip+0xf7/0x250
[   46.167124]  ? trace_hardirqs_on_caller+0x38d/0x6c0
[   46.167130]  ? __dev_change_flags+0x18d/0x630
[   46.167142]  __dev_change_flags+0x469/0x630
[   46.167152]  ? dev_set_allmulti+0x10/0x10
[   46.167157]  ? __lock_acquire+0x9c1/0x4b50
[   46.167166]  ? print_irqtrace_events+0x280/0x280
[   46.167174]  ? kvm_clock_read+0x1f/0x30
[   46.167191]  ? rtnetlink_put_metrics+0x530/0x530
[   46.167205]  dev_change_flags+0x8b/0x160
[   46.167236]  do_setlink+0xa17/0x39f0
[   46.167248]  ? sched_clock_cpu+0x18/0x2b0
[   46.167267]  ? rtnl_dump_ifinfo+0xd20/0xd20
[   46.167277]  ? print_irqtrace_events+0x280/0x280
[   46.167285]  ? kvm_clock_read+0x1f/0x30
[   46.167316]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167321]  ? kvm_sched_clock_read+0x5/0x10
[   46.167327]  ? sched_clock+0x5/0x10
[   46.167333]  ? sched_clock_cpu+0x18/0x2b0
[   46.167382]  ? memset+0x1f/0x40
[   46.167408]  ? nla_parse+0x8c/0x3f0
[   46.167419]  ? rtnetlink_put_metrics+0x530/0x530
[   46.167427]  ? kvm_sched_clock_read+0x5/0x10
[   46.167433]  ? sched_clock+0x5/0x10
[   46.167439]  ? sched_clock_cpu+0x18/0x2b0
[   46.167457]  rtnl_newlink+0x9dc/0x13a0
[   46.167484]  ? rtnl_link_unregister+0x2b0/0x2b0
[   46.167513]  ? kvm_clock_read+0x1f/0x30
[   46.167538]  ? print_irqtrace_events+0x280/0x280
[   46.167546]  ? kvm_clock_read+0x1f/0x30
[   46.167551]  ? kvm_sched_clock_read+0x5/0x10
[   46.167557]  ? sched_clock+0x5/0x10
[   46.167562]  ? sched_clock_cpu+0x18/0x2b0
[   46.167567]  ? kvm_clock_read+0x1f/0x30
[   46.167598]  ? __lock_acquire+0x9c1/0x4b50
[   46.167640]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167646]  ? kvm_clock_read+0x1f/0x30
[   46.167651]  ? kvm_sched_clock_read+0x5/0x10
[   46.167673]  ? debug_show_all_locks+0x3b0/0x3b0
[   46.167698]  ? is_bpf_text_address+0x87/0x130
[   46.167707]  ? print_irqtrace_events+0x280/0x280
[   46.167714]  ? kvm_clock_read+0x1f/0x30
[   46.167718]  ? kvm_sched_clock_read+0x5/0x10
[   46.167723]  ? sched_clock+0x5/0x10
[   46.167728]  ? sched_clock_cpu+0x18/0x2b0
[   46.167753]  ? get_lock_stats+0x18/0x160
[   46.167877]  ? __lock_is_held+0xcb/0x1a0
[   46.167908]  rtnetlink_rcv_msg+0x575/0xaa0
[   46.167913]  ? kvm_clock_read+0x1f/0x30
[   46.167925]  ? validate_linkmsg+0x900/0x900
[   46.167945]  ? 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-05 Thread Christoph Hellwig
On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote:
>  - One you have rejected, which is to have a way for "no-iommu" virtio
> (which still doesn't use an iommu on the qemu side and doesn't need
> to), to be forced to use some custom DMA ops on the VM side.
> 
>  - One, which sadly has more overhead and will require modifying more
> pieces of the puzzle, which is to make qemu uses an emulated iommu.
> Once we make qemu do that, we can then layer swiotlb on top of the
> emulated iommu on the guest side, and pass that as dma_ops to virtio.

Or number three:  have a a virtio feature bit that tells the VM
to use whatever dma ops the platform thinks are appropinquate for
the bus it pretends to be on.  Then set a dma-range that is limited
to your secure memory range (if you really need it to be runtime
enabled only after a device reset that rescans) and use the normal
dma mapping code to bounce buffer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-08-05 Thread Christoph Hellwig
On Sun, Aug 05, 2018 at 03:09:55AM +0300, Michael S. Tsirkin wrote:
> So in this case however I'm not sure what exactly do we want to add. It
> seems that from point of view of the device, there is nothing special -
> it just gets a PA and writes there.  It also seems that guest does not
> need to get any info from the device either. Instead guest itself needs
> device to DMA into specific addresses, for its own reasons.
> 
> It seems that the fact that within guest it's implemented using a bounce
> buffer and that it's easiest to do by switching virtio to use the DMA API
> isn't something virtio spec concerns itself with.

And that is exactly what we added bus_dma_mask for - the case where
the device itself has not limitation (or a bigger limitation), but
the platform limits the accessible dma ranges.  One typical case is
a PCIe root port that is only connected to the CPU through an
interconnect that is limited to 32 address bits for example.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization