Re: INFO: task hung in vhost_init_device_iotlb

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 01:22:02AM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:983542434e6b Merge tag 'edac_fix_for_5.0' of git://git.ker..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=17476498c0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=505743eba4e4f68
> dashboard link: https://syzkaller.appspot.com/bug?extid=40e28a8bd59d10ed0c42
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.

Hmm nothing obvious below. Generic corruption elsewhere?

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+40e28a8bd59d10ed0...@syzkaller.appspotmail.com
> 
> protocol 88fb is buggy, dev hsr_slave_1
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
> INFO: task syz-executor5:9417 blocked for more than 140 seconds.
>   Not tainted 5.0.0-rc3+ #48
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor5   D27576  9417   8469 0x0004
> Call Trace:
>  context_switch kernel/sched/core.c:2831 [inline]
>  __schedule+0x897/0x1e60 kernel/sched/core.c:3472
>  schedule+0xfe/0x350 kernel/sched/core.c:3516
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
>  schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3574
>  __mutex_lock_common kernel/locking/mutex.c:1002 [inline]
>  __mutex_lock+0xa3b/0x1670 kernel/locking/mutex.c:1072
>  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
>  vhost_init_device_iotlb+0x124/0x280 drivers/vhost/vhost.c:1606
>  vhost_net_set_features drivers/vhost/net.c:1674 [inline]
>  vhost_net_ioctl+0x1282/0x1c00 drivers/vhost/net.c:1739
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  file_ioctl fs/ioctl.c:509 [inline]
>  do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
>  __do_sys_ioctl fs/ioctl.c:720 [inline]
>  __se_sys_ioctl fs/ioctl.c:718 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
>  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x458099
> Code: Bad RIP value.
> RSP: 002b:7efd7ca9bc78 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 0003 RCX: 00458099
> RDX: 2080 RSI: 4008af00 RDI: 0003
> RBP: 0073bfa0 R08:  R09: 
> R10:  R11: 0246 R12: 7efd7ca9c6d4
> R13: 004c295b R14: 004d5280 R15: 
> INFO: task syz-executor5:9418 blocked for more than 140 seconds.
>   Not tainted 5.0.0-rc3+ #48
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor5   D27800  9418   8469 0x0004
> Call Trace:
>  context_switch kernel/sched/core.c:2831 [inline]
>  __schedule+0x897/0x1e60 kernel/sched/core.c:3472
>  schedule+0xfe/0x350 kernel/sched/core.c:3516
>  schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3574
>  __mutex_lock_common kernel/locking/mutex.c:1002 [inline]
>  __mutex_lock+0xa3b/0x1670 kernel/locking/mutex.c:1072
>  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
>  vhost_net_set_owner drivers/vhost/net.c:1697 [inline]
>  vhost_net_ioctl+0x426/0x1c00 drivers/vhost/net.c:1754
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  file_ioctl fs/ioctl.c:509 [inline]
>  do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
>  __do_sys_ioctl fs/ioctl.c:720 [inline]
>  __se_sys_ioctl fs/ioctl.c:718 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
>  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x458099
> Code: Bad RIP value.
> RSP: 002b:7efd7ca7ac78 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 0003 RCX: 00458099
> RDX:  RSI: 4001af01 RDI: 0003
> RBP: 0073c040 R08:  R09: 
> R10:  R11: 0246 R12: 7efd7ca7b6d4
> R13: 004c33a4 R14: 004d5e80 R15: 
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1040:
>  #0: b7479fbe (rcu_read_lock){}, at:
> debug_show_all_locks+0xc6/0x41d kernel/locking/lockdep.c:4389
> 1 lock held by rsyslogd/8285:
>  #0: 6d9ccf7d (>f_pos_lock){+.+.}, at: __fdget_pos+0x1b3/0x1f0
> fs/file.c:795
> 2 locks held by getty/8406:
>  #0: 052e805b (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40
> drivers/tty/tty_ldsem.c:341
>  #1: b90dc267 (>atomic_read_lock){+.+.}, at:
> n_tty_read+0x30a/0x1eb0 drivers/tty/n_tty.c:2154
> 2 locks held by getty/8407:
>  

[PATCH 1/3] drm/irq: Don't check for DRIVER_HAVE_IRQ in drm_irq_(un)install

2019-01-29 Thread Daniel Vetter
If a non-legacy driver calls these it's valid to assume there is
interrupt support. The flag is really only needed for legacy drivers,
which control IRQ enabling/disabling through the DRM_IOCTL_CONTROL
legacy IOCTL.

Also remove all the flag usage from non-legacy drivers.

v2: Review from Emil:
- improve commit message
- I forgot hibmc, fix that

Cc: linux-arm-ker...@lists.infradead.org
Cc: intel-...@lists.freedesktop.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-...@vger.kernel.org
Cc: freedr...@lists.freedesktop.org
Cc: virtualization@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Cc: amd-...@lists.freedesktop.org
Cc: linux-renesas-...@vger.kernel.org
Reviewed-by: Emil Velikov 
Reviewed-by: Sam Ravnborg 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
 drivers/gpu/drm/arm/hdlcd_drv.c | 2 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c| 2 +-
 drivers/gpu/drm/drm_irq.c   | 6 --
 drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c   | 2 +-
 drivers/gpu/drm/gma500/psb_drv.c| 2 +-
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 3 +--
 drivers/gpu/drm/i915/i915_drv.c | 2 +-
 drivers/gpu/drm/meson/meson_drv.c   | 2 +-
 drivers/gpu/drm/msm/msm_drv.c   | 3 +--
 drivers/gpu/drm/mxsfb/mxsfb_drv.c   | 3 +--
 drivers/gpu/drm/qxl/qxl_drv.c   | 2 +-
 drivers/gpu/drm/radeon/radeon_drv.c | 2 +-
 drivers/gpu/drm/shmobile/shmob_drm_drv.c| 2 +-
 drivers/gpu/drm/tilcdc/tilcdc_drv.c | 2 +-
 drivers/gpu/drm/vc4/vc4_drv.c   | 1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +-
 drivers/staging/vboxvideo/vbox_drv.c| 2 +-
 18 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0c22bae0c736..22502417c18c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1189,7 +1189,7 @@ amdgpu_get_crtc_scanout_position(struct drm_device *dev, 
unsigned int pipe,
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
-   DRIVER_HAVE_IRQ | DRIVER_IRQ_SHARED | DRIVER_GEM |
+   DRIVER_IRQ_SHARED | DRIVER_GEM |
DRIVER_PRIME | DRIVER_RENDER | DRIVER_MODESET | DRIVER_SYNCOBJ,
.load = amdgpu_driver_load_kms,
.open = amdgpu_driver_open_kms,
diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index e68935b80917..8fc0b884c428 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -229,7 +229,7 @@ static int hdlcd_debugfs_init(struct drm_minor *minor)
 DEFINE_DRM_GEM_CMA_FOPS(fops);
 
 static struct drm_driver hdlcd_driver = {
-   .driver_features = DRIVER_HAVE_IRQ | DRIVER_GEM |
+   .driver_features = DRIVER_GEM |
   DRIVER_MODESET | DRIVER_PRIME |
   DRIVER_ATOMIC,
.irq_handler = hdlcd_irq,
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index 034a91112098..0be13eceedba 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -720,7 +720,7 @@ static void atmel_hlcdc_dc_irq_uninstall(struct drm_device 
*dev)
 DEFINE_DRM_GEM_CMA_FOPS(fops);
 
 static struct drm_driver atmel_hlcdc_dc_driver = {
-   .driver_features = DRIVER_HAVE_IRQ | DRIVER_GEM |
+   .driver_features = DRIVER_GEM |
   DRIVER_MODESET | DRIVER_PRIME |
   DRIVER_ATOMIC,
.irq_handler = atmel_hlcdc_dc_irq_handler,
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 45a07652fa00..c5babb3e4752 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -103,9 +103,6 @@ int drm_irq_install(struct drm_device *dev, int irq)
int ret;
unsigned long sh_flags = 0;
 
-   if (!drm_core_check_feature(dev, DRIVER_HAVE_IRQ))
-   return -EOPNOTSUPP;
-
if (irq == 0)
return -EINVAL;
 
@@ -174,9 +171,6 @@ int drm_irq_uninstall(struct drm_device *dev)
bool irq_enabled;
int i;
 
-   if (!drm_core_check_feature(dev, DRIVER_HAVE_IRQ))
-   return -EOPNOTSUPP;
-
irq_enabled = dev->irq_enabled;
dev->irq_enabled = false;
 
diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c 
b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
index 54ace3436605..dfc73aade325 100644
--- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
+++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
@@ -137,7 +137,7 @@ static irqreturn_t fsl_dcu_drm_irq(int irq, void *arg)
 DEFINE_DRM_GEM_CMA_FOPS(fsl_dcu_drm_fops);
 
 static struct drm_driver fsl_dcu_drm_driver = {
-   .driver_features= DRIVER_HAVE_IRQ | 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 11:05:42AM +0800, Jason Wang wrote:
> 
> On 2019/1/30 上午10:36, Michael S. Tsirkin wrote:
> > On Wed, Jan 30, 2019 at 10:24:01AM +0800, Jason Wang wrote:
> > > On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> > > > > Fixing address of powerpc mailing list.
> > > > > 
> > > > > Thiago Jung Bauermann  writes:
> > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > With Christoph's rework of the DMA API that recently landed, the 
> > > > > > patch
> > > > > > below is the only change needed in virtio to make it work in a POWER
> > > > > > secure guest under the ultravisor.
> > > > > > 
> > > > > > The other change we need (making sure the device's dma_map_ops is 
> > > > > > NULL
> > > > > > so that the dma-direct/swiotlb code is used) can be made in
> > > > > > powerpc-specific code.
> > > > > > 
> > > > > > Of course, I also have patches (soon to be posted as RFC) which 
> > > > > > hook up
> > > > > >  to the powerpc secure guest support code.
> > > > > > 
> > > > > > What do you think?
> > > > > > 
> > > > > >   From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 
> > > > > > 2001
> > > > > > From: Thiago Jung Bauermann 
> > > > > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > > > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> > > > > > encrypted
> > > > > > 
> > > > > > The host can't access the guest memory when it's encrypted, so using
> > > > > > regular memory pages for the ring isn't an option. Go through the 
> > > > > > DMA API.
> > > > > > 
> > > > > > Signed-off-by: Thiago Jung Bauermann 
> > > > Well I think this will come back to bite us (witness xen which is now
> > > > reworking precisely this path - but at least they aren't to blame, xen
> > > > came before ACCESS_PLATFORM).
> > > > 
> > > > I also still think the right thing would have been to set
> > > > ACCESS_PLATFORM for all systems where device can't access all memory.
> > > > 
> > > > But I also think I don't have the energy to argue about power secure
> > > > guest anymore.  So be it for power secure guest since the involved
> > > > engineers disagree with me.  Hey I've been wrong in the past ;).
> > > > 
> > > > But the name "sev_active" makes me scared because at least AMD guys who
> > > > were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
> > > > wrong? I reemember distinctly that's so) will likely be affected too.
> > > > We don't want that.
> > > > 
> > > > So let's find a way to make sure it's just power secure guest for now
> > > > pls.
> > > > 
> > > > I also think we should add a dma_api near features under virtio_device
> > > > such that these hacks can move off data path.
> > > 
> > > Anyway the current Xen code is conflict with spec which said:
> > > 
> > > "If this feature bit is set to 0, then the device has same access to 
> > > memory
> > > addresses supplied to it as the driver has. In particular, the device will
> > > always use physical addresses matching addresses used by the driver
> > > (typically meaning physical addresses used by the CPU) and not translated
> > > further, and can access any address supplied to it by the driver. When
> > > clear, this overrides any platform-specific description of whether device
> > > access is limited or translated in any way, e.g. whether an IOMMU may be
> > > present. "
> > > 
> > > I wonder how much value that the above description can give us. It's kind 
> > > of
> > > odd that the behavior of "when the feature is not negotiated" is described
> > > in the spec.
> > Hmm what's odd about it? We need to describe the behaviour is all cases.
> 
> 
> Well, try to limit the behavior of 'legacy' driver is tricky or even
> impossible. Xen is an exact example for this.

So don't. Xen got grand-fathered in because when that came
along we thought it's a one off thing. Was easier to just
add that as a special case. But really >99% of people
have a hypervisor device with direct guest memory access.
All else is esoterica.

> 
> > 
> > > Personally I think we can remove the above and then we can
> > > switch to use DMA API unconditionally in guest driver. It may have single
> > > digit regression probably, we can try to overcome it.
> > > 
> > > Thanks
> > This has been discussed ad nauseum. virtio is all about compatibility.
> > Losing a couple of lines of code isn't worth breaking working setups.
> > People that want "just use DMA API no tricks" now have the option.
> > Setting a flag in a feature bit map is literally a single line
> > of code in the hypervisor. So stop pushing for breaking working
> > legacy setups and just fix it in the right place.
> 
> 
> I may miss soemthing, which kind of legacy setup is broken? Do you mean
> using virtio without IOMMU_PLATFORM on platform with IOMMU? We actually
> unbreak this setup.
> 
> Thanks


Legacy setups by definition are working setups.  The rules are pretty
simple. By default virtio 

[PATCH 0/5 v4] Fix virtio-blk issue with SWIOTLB

2019-01-29 Thread Joerg Roedel


Hi,

here is the fourth version of this patch-set. Previous
versions can be found here:

V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/

V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/

V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/

The problem solved here is a limitation of the SWIOTLB implementation,
which does not support allocations larger than 256kb.  When the
virtio-blk driver tries to read/write a block larger than that, the
allocation of the dma-handle fails and an IO error is reported.

For all changes to v3, a diff to v3 of the patch-set is at
the end of this email.

Please review.

Thanks,

Joerg

Joerg Roedel (5):
  swiotlb: Introduce swiotlb_max_mapping_size()
  swiotlb: Add is_swiotlb_active() function
  dma: Introduce dma_max_mapping_size()
  virtio: Introduce virtio_max_dma_size()
  virtio-blk: Consider virtio_max_dma_size() for maximum segment size

 Documentation/DMA-API.txt|  8 
 drivers/block/virtio_blk.c   | 10 ++
 drivers/virtio/virtio_ring.c | 10 ++
 include/linux/dma-mapping.h  | 16 
 include/linux/swiotlb.h  | 11 +++
 include/linux/virtio.h   |  2 ++
 kernel/dma/direct.c  | 11 +++
 kernel/dma/swiotlb.c | 14 ++
 8 files changed, 78 insertions(+), 4 deletions(-)

-- 
2.17.1

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index e133ccd60228..acfe3d0f78d1 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -195,6 +195,14 @@ Requesting the required mask does not alter the current 
mask.  If you
 wish to take advantage of it, you should issue a dma_set_mask()
 call to set the mask to the value returned.
 
+::
+
+   size_t
+   dma_direct_max_mapping_size(struct device *dev);
+
+Returns the maximum size of a mapping for the device. The size parameter
+of the mapping functions like dma_map_single(), dma_map_page() and
+others should not be larger than the returned value.
 
 Part Id - Streaming DMA mappings
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 5c087d330b4b..e9e786b4b598 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,7 +62,7 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
-extern size_t swiotlb_max_mapping_size(struct device *dev);
+size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
 
 #ifdef CONFIG_SWIOTLB
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9fbd075081d9..c873f9cc2146 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -670,5 +670,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 
 bool is_swiotlb_active(void)
 {
-   return !no_iotlb_memory;
+   /*
+* When SWIOTLB is initialized, even if io_tlb_start points to physical
+* address zero, io_tlb_end surely doesn't.
+*/
+   return io_tlb_end != 0;
 }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/5] virtio: Introduce virtio_max_dma_size()

2019-01-29 Thread Joerg Roedel
From: Joerg Roedel 

This function returns the maximum segment size for a single
dma transaction of a virtio device. The possible limit comes
from the SWIOTLB implementation in the Linux kernel, that
has an upper limit of (currently) 256kb of contiguous
memory it can map. Other DMA-API implementations might also
have limits.

Use the new dma_max_mapping_size() function to determine the
maximum mapping size when DMA-API is in use for virtio.

Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Joerg Roedel 
---
 drivers/virtio/virtio_ring.c | 10 ++
 include/linux/virtio.h   |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..9ca3fe6af9fa 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -266,6 +266,16 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
return false;
 }
 
+size_t virtio_max_dma_size(struct virtio_device *vdev)
+{
+   size_t max_segment_size = SIZE_MAX;
+
+   if (vring_use_dma_api(vdev))
+   max_segment_size = dma_max_mapping_size(>dev);
+
+   return max_segment_size;
+}
+
 static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
  dma_addr_t *dma_handle, gfp_t flag)
 {
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index fa1b5da2804e..673fe3ef3607 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -157,6 +157,8 @@ int virtio_device_freeze(struct virtio_device *dev);
 int virtio_device_restore(struct virtio_device *dev);
 #endif
 
+size_t virtio_max_dma_size(struct virtio_device *vdev);
+
 #define virtio_device_for_each_vq(vdev, vq) \
list_for_each_entry(vq, >vqs, list)
 
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] drm/qxl: use ttm_tt

2019-01-29 Thread Noralf Trønnes


Den 29.01.2019 09.25, skrev Gerd Hoffmann:
> qxl device will not dma, so we don't need ttm_dma_tt.  Go use ttm_tt
> instead, to avoid wasting resources (swiotlb bounce buffers for
> example).
> 
> Signed-off-by: Gerd Hoffmann 
> ---

Acked-by: Noralf Trønnes 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 5/5] virtio-blk: Consider virtio_max_dma_size() for maximum segment size

2019-01-29 Thread Joerg Roedel
From: Joerg Roedel 

Segments can't be larger than the maximum DMA mapping size
supported on the platform. Take that into account when
setting the maximum segment size for a block device.

Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Joerg Roedel 
---
 drivers/block/virtio_blk.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b16a887bbd02..4bc083b7c9b5 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -723,7 +723,7 @@ static int virtblk_probe(struct virtio_device *vdev)
struct request_queue *q;
int err, index;
 
-   u32 v, blk_size, sg_elems, opt_io_size;
+   u32 v, blk_size, max_size, sg_elems, opt_io_size;
u16 min_io_size;
u8 physical_block_exp, alignment_offset;
 
@@ -826,14 +826,16 @@ static int virtblk_probe(struct virtio_device *vdev)
/* No real sector limit. */
blk_queue_max_hw_sectors(q, -1U);
 
+   max_size = virtio_max_dma_size(vdev);
+
/* Host can optionally specify maximum segment size and number of
 * segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
   struct virtio_blk_config, size_max, );
if (!err)
-   blk_queue_max_segment_size(q, v);
-   else
-   blk_queue_max_segment_size(q, -1U);
+   max_size = min(max_size, v);
+
+   blk_queue_max_segment_size(q, max_size);
 
/* Host can optionally specify the block size of the device */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/5] swiotlb: Introduce swiotlb_max_mapping_size()

2019-01-29 Thread Joerg Roedel
From: Joerg Roedel 

The function returns the maximum size that can be remapped
by the SWIOTLB implementation. This function will be later
exposed to users through the DMA-API.

Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Joerg Roedel 
---
 include/linux/swiotlb.h | 5 +
 kernel/dma/swiotlb.c| 5 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7c007ed7505f..1c22d96e1742 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,7 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
+size_t swiotlb_max_mapping_size(struct device *dev);
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
@@ -95,6 +96,10 @@ static inline unsigned int swiotlb_max_segment(void)
 {
return 0;
 }
+static inline size_t swiotlb_max_mapping_size(struct device *dev)
+{
+   return SIZE_MAX;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fb6fd68b9c7..9cb21259cb0b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -662,3 +662,8 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
+
+size_t swiotlb_max_mapping_size(struct device *dev)
+{
+   return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
+}
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] drm/qxl: use ttm_tt

2019-01-29 Thread Gerd Hoffmann
qxl device will not dma, so we don't need ttm_dma_tt.  Go use ttm_tt
instead, to avoid wasting resources (swiotlb bounce buffers for
example).

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_ttm.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 36ea993aac..92f5db5b29 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -204,7 +204,7 @@ static void qxl_ttm_io_mem_free(struct ttm_bo_device *bdev,
  * TTM backend functions.
  */
 struct qxl_ttm_tt {
-   struct ttm_dma_tt   ttm;
+   struct ttm_tt   ttm;
struct qxl_device   *qdev;
u64 offset;
 };
@@ -233,7 +233,7 @@ static void qxl_ttm_backend_destroy(struct ttm_tt *ttm)
 {
struct qxl_ttm_tt *gtt = (void *)ttm;
 
-   ttm_dma_tt_fini(>ttm);
+   ttm_tt_fini(>ttm);
kfree(gtt);
 }
 
@@ -253,13 +253,13 @@ static struct ttm_tt *qxl_ttm_tt_create(struct 
ttm_buffer_object *bo,
gtt = kzalloc(sizeof(struct qxl_ttm_tt), GFP_KERNEL);
if (gtt == NULL)
return NULL;
-   gtt->ttm.ttm.func = _backend_func;
+   gtt->ttm.func = _backend_func;
gtt->qdev = qdev;
-   if (ttm_dma_tt_init(>ttm, bo, page_flags)) {
+   if (ttm_tt_init(>ttm, bo, page_flags)) {
kfree(gtt);
return NULL;
}
-   return >ttm.ttm;
+   return >ttm;
 }
 
 static void qxl_move_null(struct ttm_buffer_object *bo,
-- 
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] virtio-blk: Consider virtio_max_dma_size() for maximum segment size

2019-01-29 Thread Christoph Hellwig
On Mon, Jan 28, 2019 at 12:14:33PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 28, 2019 at 09:05:26AM +0100, Christoph Hellwig wrote:
> > On Thu, Jan 24, 2019 at 10:51:51AM +0100, Joerg Roedel wrote:
> > > On Thu, Jan 24, 2019 at 09:42:21AM +0100, Christoph Hellwig wrote:
> > > > Yes.  But more importantly it would fix the limit for all other block
> > > > drivers that set large segment sizes when running over swiotlb.
> > > 
> > > True, so it would be something like the diff below? I havn't worked on
> > > the block layer, so I don't know if that needs additional checks for
> > > ->dev or anything.
> > 
> > Looks sensible.  Maybe for now we'll just do the virtio-blk case
> > that triggered it, and we'll do something like this patch for the
> > next merge window.  We'll also need to apply the same magic to the
> > DMA boundary.
> 
> So do I get an ack for this patch then?

I'll wait for a resend of the series to review the whole thing.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/5] swiotlb: Add is_swiotlb_active() function

2019-01-29 Thread Joerg Roedel
From: Joerg Roedel 

This function will be used from dma_direct code to determine
the maximum segment size of a dma mapping.

Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Joerg Roedel 
---
 include/linux/swiotlb.h | 6 ++
 kernel/dma/swiotlb.c| 9 +
 2 files changed, 15 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 1c22d96e1742..e9e786b4b598 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -63,6 +63,7 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_active(void);
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
@@ -100,6 +101,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
 {
return SIZE_MAX;
 }
+
+static inline bool is_swiotlb_active(void)
+{
+   return false;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9cb21259cb0b..c873f9cc2146 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -667,3 +667,12 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 {
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 }
+
+bool is_swiotlb_active(void)
+{
+   /*
+* When SWIOTLB is initialized, even if io_tlb_start points to physical
+* address zero, io_tlb_end surely doesn't.
+*/
+   return io_tlb_end != 0;
+}
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/5] dma: Introduce dma_max_mapping_size()

2019-01-29 Thread Joerg Roedel
From: Joerg Roedel 

The function returns the maximum size that can be mapped
using DMA-API functions. The patch also adds the
implementation for direct DMA and a new dma_map_ops pointer
so that other implementations can expose their limit.

Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Joerg Roedel 
---
 Documentation/DMA-API.txt   |  8 
 include/linux/dma-mapping.h | 16 
 kernel/dma/direct.c | 11 +++
 3 files changed, 35 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index e133ccd60228..acfe3d0f78d1 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -195,6 +195,14 @@ Requesting the required mask does not alter the current 
mask.  If you
 wish to take advantage of it, you should issue a dma_set_mask()
 call to set the mask to the value returned.
 
+::
+
+   size_t
+   dma_direct_max_mapping_size(struct device *dev);
+
+Returns the maximum size of a mapping for the device. The size parameter
+of the mapping functions like dma_map_single(), dma_map_page() and
+others should not be larger than the returned value.
 
 Part Id - Streaming DMA mappings
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f6ded992c183..a3ca8a71a704 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -130,6 +130,7 @@ struct dma_map_ops {
enum dma_data_direction direction);
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
+   size_t (*max_mapping_size)(struct device *dev);
 };
 
 #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
@@ -257,6 +258,8 @@ static inline void dma_direct_sync_sg_for_cpu(struct device 
*dev,
 }
 #endif
 
+size_t dma_direct_max_mapping_size(struct device *dev);
+
 #ifdef CONFIG_HAS_DMA
 #include 
 
@@ -440,6 +443,19 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return 0;
 }
 
+static inline size_t dma_max_mapping_size(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   size_t size = SIZE_MAX;
+
+   if (dma_is_direct(ops))
+   size = dma_direct_max_mapping_size(dev);
+   else if (ops && ops->max_mapping_size)
+   size = ops->max_mapping_size(dev);
+
+   return size;
+}
+
 void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, unsigned long attrs);
 void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 355d16acee6d..6310ad01f915 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -380,3 +380,14 @@ int dma_direct_supported(struct device *dev, u64 mask)
 */
return mask >= __phys_to_dma(dev, min_mask);
 }
+
+size_t dma_direct_max_mapping_size(struct device *dev)
+{
+   size_t size = SIZE_MAX;
+
+   /* If SWIOTLB is active, use its maximum mapping size */
+   if (is_swiotlb_active())
+   size = swiotlb_max_mapping_size(dev);
+
+   return size;
+}
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: Jason Wang 
Date: Mon, 28 Jan 2019 15:05:05 +0800

> After batched used ring updating was introduced in commit e2b3b35eb989
> ("vhost_net: batch used ring update in rx"). We tend to batch heads in
> vq->heads for more than one packet. But the quota passed to
> get_rx_bufs() was not correctly limited, which can result a OOB write
> in vq->heads.
> 
> headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> vhost_len, , vq_log, ,
> likely(mergeable) ? UIO_MAXIOV : 1);
> 
> UIO_MAXIOV was still used which is wrong since we could have batched
> used in vq->heads, this will cause OOB if the next buffer needs more
> than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
> batched 64 (VHOST_NET_BATCH) heads:
 ...
> Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
> vhost-net. This is done through set the limitation through
> vhost_dev_init(), then set_owner can allocate the number of iov in a
> per device manner.
> 
> This fixes CVE-2018-16880.
> 
> Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 10:24:01AM +0800, Jason Wang wrote:
> 
> On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> > > Fixing address of powerpc mailing list.
> > > 
> > > Thiago Jung Bauermann  writes:
> > > 
> > > > Hello,
> > > > 
> > > > With Christoph's rework of the DMA API that recently landed, the patch
> > > > below is the only change needed in virtio to make it work in a POWER
> > > > secure guest under the ultravisor.
> > > > 
> > > > The other change we need (making sure the device's dma_map_ops is NULL
> > > > so that the dma-direct/swiotlb code is used) can be made in
> > > > powerpc-specific code.
> > > > 
> > > > Of course, I also have patches (soon to be posted as RFC) which hook up
> > > >  to the powerpc secure guest support code.
> > > > 
> > > > What do you think?
> > > > 
> > > >  From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > > > From: Thiago Jung Bauermann 
> > > > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > > > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> > > > encrypted
> > > > 
> > > > The host can't access the guest memory when it's encrypted, so using
> > > > regular memory pages for the ring isn't an option. Go through the DMA 
> > > > API.
> > > > 
> > > > Signed-off-by: Thiago Jung Bauermann 
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame, xen
> > came before ACCESS_PLATFORM).
> > 
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all memory.
> > 
> > But I also think I don't have the energy to argue about power secure
> > guest anymore.  So be it for power secure guest since the involved
> > engineers disagree with me.  Hey I've been wrong in the past ;).
> > 
> > But the name "sev_active" makes me scared because at least AMD guys who
> > were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
> > wrong? I reemember distinctly that's so) will likely be affected too.
> > We don't want that.
> > 
> > So let's find a way to make sure it's just power secure guest for now
> > pls.
> > 
> > I also think we should add a dma_api near features under virtio_device
> > such that these hacks can move off data path.
> 
> 
> Anyway the current Xen code is conflict with spec which said:
> 
> "If this feature bit is set to 0, then the device has same access to memory
> addresses supplied to it as the driver has. In particular, the device will
> always use physical addresses matching addresses used by the driver
> (typically meaning physical addresses used by the CPU) and not translated
> further, and can access any address supplied to it by the driver. When
> clear, this overrides any platform-specific description of whether device
> access is limited or translated in any way, e.g. whether an IOMMU may be
> present. "
> 
> I wonder how much value that the above description can give us. It's kind of
> odd that the behavior of "when the feature is not negotiated" is described
> in the spec.

Hmm what's odd about it? We need to describe the behaviour is all cases.

> Personally I think we can remove the above and then we can
> switch to use DMA API unconditionally in guest driver. It may have single
> digit regression probably, we can try to overcome it.
> 
> Thanks

This has been discussed ad nauseum. virtio is all about compatibility.
Losing a couple of lines of code isn't worth breaking working setups.
People that want "just use DMA API no tricks" now have the option.
Setting a flag in a feature bit map is literally a single line
of code in the hypervisor. So stop pushing for breaking working
legacy setups and just fix it in the right place.

> 
> > 
> > By the way could you please respond about virtio-iommu and
> > why there's no support for ACCESS_PLATFORM on power?
> > 
> > I have Cc'd you on these discussions.
> > 
> > 
> > Thanks!
> > 
> > 
> > > > ---
> > > >   drivers/virtio/virtio_ring.c | 5 -
> > > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index cd7e755484e3..321a27075380 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device 
> > > > *vdev)
> > > >  * not work without an even larger kludge.  Instead, enable
> > > >  * the DMA API if we're a Xen guest, which at least allows
> > > >  * all of the sensible Xen configurations to work correctly.
> > > > +*
> > > > +* Also, if guest memory is encrypted the host can't access
> > > > +* it directly. In this case, we'll need to use the DMA API.
> > > >  */
> > > > -   if (xen_domain())
> > > > +   if (xen_domain() || sev_active())
> > > >   

Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread Stefan Hajnoczi
On Mon, Jan 28, 2019 at 03:05:05PM +0800, Jason Wang wrote:
> After batched used ring updating was introduced in commit e2b3b35eb989
> ("vhost_net: batch used ring update in rx"). We tend to batch heads in
> vq->heads for more than one packet. But the quota passed to
> get_rx_bufs() was not correctly limited, which can result a OOB write
> in vq->heads.
> 
> headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> vhost_len, , vq_log, ,
> likely(mergeable) ? UIO_MAXIOV : 1);
> 
> UIO_MAXIOV was still used which is wrong since we could have batched
> used in vq->heads, this will cause OOB if the next buffer needs more
> than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
> batched 64 (VHOST_NET_BATCH) heads:
> 
> =
> BUG kmalloc-8k (Tainted: GB): Redzone overwritten
> -
> 
> INFO: 0xfd93b7a2-0xf0713384. First byte 0xa9 instead of 0xcc
> INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
> kmem_cache_alloc_trace+0xbb/0x140
> alloc_pd+0x22/0x60
> gen8_ppgtt_create+0x11d/0x5f0
> i915_ppgtt_create+0x16/0x80
> i915_gem_create_context+0x248/0x390
> i915_gem_context_create_ioctl+0x4b/0xe0
> drm_ioctl_kernel+0xa5/0xf0
> drm_ioctl+0x2ed/0x3a0
> do_vfs_ioctl+0x9f/0x620
> ksys_ioctl+0x6b/0x80
> __x64_sys_ioctl+0x11/0x20
> do_syscall_64+0x43/0xf0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> INFO: Slab 0xd13e87af objects=3 used=3 fp=0x  (null) 
> flags=0x2010201
> INFO: Object 0x03278802 @offset=17064 fp=0xe2e6652b
> 
> Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
> vhost-net. This is done through set the limitation through
> vhost_dev_init(), then set_owner can allocate the number of iov in a
> per device manner.
> 
> This fixes CVE-2018-16880.
> 
> Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/net.c   | 3 ++-
>  drivers/vhost/scsi.c  | 2 +-
>  drivers/vhost/vhost.c | 7 ---
>  drivers/vhost/vhost.h | 4 +++-
>  drivers/vhost/vsock.c | 2 +-
>  5 files changed, 11 insertions(+), 7 deletions(-)

No change in the scsi and vsock cases.  I haven't reviewed the net case.

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Jason Wang


On 2019/1/30 上午10:36, Michael S. Tsirkin wrote:

On Wed, Jan 30, 2019 at 10:24:01AM +0800, Jason Wang wrote:

On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:

On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:

Fixing address of powerpc mailing list.

Thiago Jung Bauermann  writes:


Hello,

With Christoph's rework of the DMA API that recently landed, the patch
below is the only change needed in virtio to make it work in a POWER
secure guest under the ultravisor.

The other change we need (making sure the device's dma_map_ops is NULL
so that the dma-direct/swiotlb code is used) can be made in
powerpc-specific code.

Of course, I also have patches (soon to be posted as RFC) which hook up
 to the powerpc secure guest support code.

What do you think?

  From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Thu, 24 Jan 2019 22:08:02 -0200
Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

The host can't access the guest memory when it's encrypted, so using
regular memory pages for the ring isn't an option. Go through the DMA API.

Signed-off-by: Thiago Jung Bauermann 

Well I think this will come back to bite us (witness xen which is now
reworking precisely this path - but at least they aren't to blame, xen
came before ACCESS_PLATFORM).

I also still think the right thing would have been to set
ACCESS_PLATFORM for all systems where device can't access all memory.

But I also think I don't have the energy to argue about power secure
guest anymore.  So be it for power secure guest since the involved
engineers disagree with me.  Hey I've been wrong in the past ;).

But the name "sev_active" makes me scared because at least AMD guys who
were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
wrong? I reemember distinctly that's so) will likely be affected too.
We don't want that.

So let's find a way to make sure it's just power secure guest for now
pls.

I also think we should add a dma_api near features under virtio_device
such that these hacks can move off data path.


Anyway the current Xen code is conflict with spec which said:

"If this feature bit is set to 0, then the device has same access to memory
addresses supplied to it as the driver has. In particular, the device will
always use physical addresses matching addresses used by the driver
(typically meaning physical addresses used by the CPU) and not translated
further, and can access any address supplied to it by the driver. When
clear, this overrides any platform-specific description of whether device
access is limited or translated in any way, e.g. whether an IOMMU may be
present. "

I wonder how much value that the above description can give us. It's kind of
odd that the behavior of "when the feature is not negotiated" is described
in the spec.

Hmm what's odd about it? We need to describe the behaviour is all cases.



Well, try to limit the behavior of 'legacy' driver is tricky or even 
impossible. Xen is an exact example for this.






Personally I think we can remove the above and then we can
switch to use DMA API unconditionally in guest driver. It may have single
digit regression probably, we can try to overcome it.

Thanks

This has been discussed ad nauseum. virtio is all about compatibility.
Losing a couple of lines of code isn't worth breaking working setups.
People that want "just use DMA API no tricks" now have the option.
Setting a flag in a feature bit map is literally a single line
of code in the hypervisor. So stop pushing for breaking working
legacy setups and just fix it in the right place.



I may miss soemthing, which kind of legacy setup is broken? Do you mean 
using virtio without IOMMU_PLATFORM on platform with IOMMU? We actually 
unbreak this setup.


Thanks





By the way could you please respond about virtio-iommu and
why there's no support for ACCESS_PLATFORM on power?

I have Cc'd you on these discussions.


Thanks!



---
   drivers/virtio/virtio_ring.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..321a27075380 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
 * not work without an even larger kludge.  Instead, enable
 * the DMA API if we're a Xen guest, which at least allows
 * all of the sensible Xen configurations to work correctly.
+*
+* Also, if guest memory is encrypted the host can't access
+* it directly. In this case, we'll need to use the DMA API.
 */
-   if (xen_domain())
+   if (xen_domain() || sev_active())
return true;

return false;

--
Thiago Jung Bauermann
IBM Linux Technology Center

___
Virtualization mailing list

Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: David Miller 
Date: Tue, 29 Jan 2019 15:10:26 -0800 (PST)

> Yeah the CVE pushed my hand a little bit, and I knew I was going to
> send Linus a pull request today because David Watson needs some TLS
> changes in net-next.

I also want to make a general comment for the record.

If I let patches slip consistently past 24 hours my backlog is
unmanageable.  Even with aggressively applying things quickly I'm
right now at 70-75.  If I do not do what I am doing, then it's in the
100-150 range.

So I am at the point where I often must move forward with patches that
I think I personally can verify and vet on my own.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 03:38:10PM -0800, David Miller wrote:
> From: David Miller 
> Date: Tue, 29 Jan 2019 15:10:26 -0800 (PST)
> 
> > Yeah the CVE pushed my hand a little bit, and I knew I was going to
> > send Linus a pull request today because David Watson needs some TLS
> > changes in net-next.
> 
> I also want to make a general comment for the record.
> 
> If I let patches slip consistently past 24 hours my backlog is
> unmanageable.  Even with aggressively applying things quickly I'm
> right now at 70-75.  If I do not do what I am doing, then it's in the
> 100-150 range.
> 
> So I am at the point where I often must move forward with patches that
> I think I personally can verify and vet on my own.

If it helps I can include most virtio stuff in my pull requests instead.
Or if that can't work since there's too often a dependency on net-next,
maybe Jason wants to create a tree and send pull requests to you.  Let
us know if that will help, and which of the options looks better from
your POV.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Jason Wang


On 2019/1/30 上午3:02, Michael S. Tsirkin wrote:

On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:

Fixing address of powerpc mailing list.

Thiago Jung Bauermann  writes:


Hello,

With Christoph's rework of the DMA API that recently landed, the patch
below is the only change needed in virtio to make it work in a POWER
secure guest under the ultravisor.

The other change we need (making sure the device's dma_map_ops is NULL
so that the dma-direct/swiotlb code is used) can be made in
powerpc-specific code.

Of course, I also have patches (soon to be posted as RFC) which hook up
 to the powerpc secure guest support code.

What do you think?

 From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Thu, 24 Jan 2019 22:08:02 -0200
Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

The host can't access the guest memory when it's encrypted, so using
regular memory pages for the ring isn't an option. Go through the DMA API.

Signed-off-by: Thiago Jung Bauermann 

Well I think this will come back to bite us (witness xen which is now
reworking precisely this path - but at least they aren't to blame, xen
came before ACCESS_PLATFORM).

I also still think the right thing would have been to set
ACCESS_PLATFORM for all systems where device can't access all memory.

But I also think I don't have the energy to argue about power secure
guest anymore.  So be it for power secure guest since the involved
engineers disagree with me.  Hey I've been wrong in the past ;).

But the name "sev_active" makes me scared because at least AMD guys who
were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
wrong? I reemember distinctly that's so) will likely be affected too.
We don't want that.

So let's find a way to make sure it's just power secure guest for now
pls.

I also think we should add a dma_api near features under virtio_device
such that these hacks can move off data path.



Anyway the current Xen code is conflict with spec which said:

"If this feature bit is set to 0, then the device has same access to 
memory addresses supplied to it as the driver has. In particular, the 
device will always use physical addresses matching addresses used by the 
driver (typically meaning physical addresses used by the CPU) and not 
translated further, and can access any address supplied to it by the 
driver. When clear, this overrides any platform-specific description of 
whether device access is limited or translated in any way, e.g. whether 
an IOMMU may be present. "


I wonder how much value that the above description can give us. It's 
kind of odd that the behavior of "when the feature is not negotiated" is 
described in the spec. Personally I think we can remove the above and 
then we can switch to use DMA API unconditionally in guest driver. It 
may have single digit regression probably, we can try to overcome it.


Thanks




By the way could you please respond about virtio-iommu and
why there's no support for ACCESS_PLATFORM on power?

I have Cc'd you on these discussions.


Thanks!



---
  drivers/virtio/virtio_ring.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..321a27075380 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
 * not work without an even larger kludge.  Instead, enable
 * the DMA API if we're a Xen guest, which at least allows
 * all of the sensible Xen configurations to work correctly.
+*
+* Also, if guest memory is encrypted the host can't access
+* it directly. In this case, we'll need to use the DMA API.
 */
-   if (xen_domain())
+   if (xen_domain() || sev_active())
return true;

return false;


--
Thiago Jung Bauermann
IBM Linux Technology Center

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 29 Jan 2019 17:54:44 -0500

> On Mon, Jan 28, 2019 at 10:54:44PM -0800, David Miller wrote:
>> From: Jason Wang 
>> Date: Mon, 28 Jan 2019 15:05:05 +0800
>> 
>> > After batched used ring updating was introduced in commit e2b3b35eb989
>> > ("vhost_net: batch used ring update in rx"). We tend to batch heads in
>> > vq->heads for more than one packet. But the quota passed to
>> > get_rx_bufs() was not correctly limited, which can result a OOB write
>> > in vq->heads.
>> > 
>> > headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>> > vhost_len, , vq_log, ,
>> > likely(mergeable) ? UIO_MAXIOV : 1);
>> > 
>> > UIO_MAXIOV was still used which is wrong since we could have batched
>> > used in vq->heads, this will cause OOB if the next buffer needs more
>> > than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
>> > batched 64 (VHOST_NET_BATCH) heads:
>>  ...
>> > Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
>> > vhost-net. This is done through set the limitation through
>> > vhost_dev_init(), then set_owner can allocate the number of iov in a
>> > per device manner.
>> > 
>> > This fixes CVE-2018-16880.
>> > 
>> > Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
>> > Signed-off-by: Jason Wang 
>> 
>> Applied and queued up for -stable, thanks!
> 
> Wow it seems we are down to hours round time post to queue.
> It would be hard to keep up that rate generally.
> However, I am guessing this was already in downstreams, and it's a CVE,
> so I guess it's a no brainer and review wasn't really necessary - was
> that the idea? Just checking.

Yeah the CVE pushed my hand a little bit, and I knew I was going to send Linus
a pull request today because David Watson needs some TLS changes in net-next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread Michael S. Tsirkin
On Mon, Jan 28, 2019 at 10:54:44PM -0800, David Miller wrote:
> From: Jason Wang 
> Date: Mon, 28 Jan 2019 15:05:05 +0800
> 
> > After batched used ring updating was introduced in commit e2b3b35eb989
> > ("vhost_net: batch used ring update in rx"). We tend to batch heads in
> > vq->heads for more than one packet. But the quota passed to
> > get_rx_bufs() was not correctly limited, which can result a OOB write
> > in vq->heads.
> > 
> > headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> > vhost_len, , vq_log, ,
> > likely(mergeable) ? UIO_MAXIOV : 1);
> > 
> > UIO_MAXIOV was still used which is wrong since we could have batched
> > used in vq->heads, this will cause OOB if the next buffer needs more
> > than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
> > batched 64 (VHOST_NET_BATCH) heads:
>  ...
> > Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
> > vhost-net. This is done through set the limitation through
> > vhost_dev_init(), then set_owner can allocate the number of iov in a
> > per device manner.
> > 
> > This fixes CVE-2018-16880.
> > 
> > Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
> > Signed-off-by: Jason Wang 
> 
> Applied and queued up for -stable, thanks!

Wow it seems we are down to hours round time post to queue.
It would be hard to keep up that rate generally.
However, I am guessing this was already in downstreams, and it's a CVE,
so I guess it's a no brainer and review wasn't really necessary - was
that the idea? Just checking.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 net 5/7] virtio_net: Don't process redirected XDP frames when XDP is disabled

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 09:45:57AM +0900, Toshiaki Makita wrote:
> Commit 8dcc5b0ab0ec ("virtio_net: fix ndo_xdp_xmit crash towards dev not
> ready for XDP") tried to avoid access to unexpected sq while XDP is
> disabled, but was not complete.
> 
> There was a small window which causes out of bounds sq access in
> virtnet_xdp_xmit() while disabling XDP.
> 
> An example case of
>  - curr_queue_pairs = 6 (2 for SKB and 4 for XDP)
>  - online_cpu_num = xdp_queue_paris = 4
> when XDP is enabled:
> 
> CPU 0 CPU 1
> (Disabling XDP)   (Processing redirected XDP frames)
> 
>   virtnet_xdp_xmit()
> virtnet_xdp_set()
>  _virtnet_set_queues()
>   set curr_queue_pairs (2)
>check if rq->xdp_prog is not NULL
>virtnet_xdp_sq(vi)
> qp = curr_queue_pairs -
>  xdp_queue_pairs +
>  smp_processor_id()
>= 2 - 4 + 1 = -1
> sq = >sq[qp] // out of bounds access
>   set xdp_queue_pairs (0)
>   rq->xdp_prog = NULL
> 
> Basically we should not change curr_queue_pairs and xdp_queue_pairs
> while someone can read the values. Thus, when disabling XDP, assign NULL
> to rq->xdp_prog first, and wait for RCU grace period, then change
> xxx_queue_pairs.
> Note that we need to keep the current order when enabling XDP though.
> 
> - v2: Make rcu_assign_pointer/synchronize_net conditional instead of
>   _virtnet_set_queues.
> 
> Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
> Signed-off-by: Toshiaki Makita 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/net/virtio_net.c | 33 ++---
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 669b65c..cea52e4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2410,6 +2410,10 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   return -ENOMEM;
>   }
>  
> + old_prog = rtnl_dereference(vi->rq[0].xdp_prog);
> + if (!prog && !old_prog)
> + return 0;
> +
>   if (prog) {
>   prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>   if (IS_ERR(prog))
> @@ -2424,21 +2428,30 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   }
>   }
>  
> + if (!prog) {
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> + if (i == 0)
> + virtnet_restore_guest_offloads(vi);
> + }
> + synchronize_net();
> + }
> +
>   err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>   if (err)
>   goto err;
>   netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
>   vi->xdp_queue_pairs = xdp_qp;
>  
> - for (i = 0; i < vi->max_queue_pairs; i++) {
> - old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> - rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> - if (i == 0) {
> - if (!old_prog)
> + if (prog) {
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> + if (i == 0 && !old_prog)
>   virtnet_clear_guest_offloads(vi);
> - if (!prog)
> - virtnet_restore_guest_offloads(vi);
>   }
> + }
> +
> + for (i = 0; i < vi->max_queue_pairs; i++) {
>   if (old_prog)
>   bpf_prog_put(old_prog);
>   if (netif_running(dev)) {
> @@ -2451,6 +2464,12 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   return 0;
>  
>  err:
> + if (!prog) {
> + virtnet_clear_guest_offloads(vi);
> + for (i = 0; i < vi->max_queue_pairs; i++)
> + rcu_assign_pointer(vi->rq[i].xdp_prog, old_prog);
> + }
> +
>   if (netif_running(dev)) {
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi);
> -- 
> 1.8.3.1
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 net 7/7] virtio_net: Differentiate sk_buff and xdp_frame on freeing

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 09:45:59AM +0900, Toshiaki Makita wrote:
> We do not reset or free up unused buffers when enabling/disabling XDP,
> so it can happen that xdp_frames are freed after disabling XDP or
> sk_buffs are freed after enabling XDP on xdp tx queues.
> Thus we need to handle both forms (xdp_frames and sk_buffs) regardless
> of XDP setting.
> One way to trigger this problem is to disable XDP when napi_tx is
> enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable()
> which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx()
> which invokes free_old_xmit_skbs() for queues which have been used by
> XDP.
> 
> Note that even with this change we need to keep skipping
> free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP
> tx queues do not aquire queue locks.
> 
> - v2: Use napi_consume_skb() instead of dev_consume_skb_any()
> 
> Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
> Signed-off-by: Toshiaki Makita 

Acked-by: Michael S. Tsirkin 

> ---
> NOTE: Dropped Acked-by because of the v2 change.
> 
>  drivers/net/virtio_net.c | 54 
> +---
>  1 file changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 1d454ce..2594481 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,8 @@
>  #define VIRTIO_XDP_TXBIT(0)
>  #define VIRTIO_XDP_REDIR BIT(1)
>  
> +#define VIRTIO_XDP_FLAG  BIT(0)
> +
>  /* RX packet size EWMA. The average packet size is used to determine the 
> packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>   * at once, the weight is chosen so that the EWMA will be insensitive to 
> short-
> @@ -252,6 +254,21 @@ struct padded_vnet_hdr {
>   char padding[4];
>  };
>  
> +static bool is_xdp_frame(void *ptr)
> +{
> + return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> +}
> +
> +static void *xdp_to_ptr(struct xdp_frame *ptr)
> +{
> + return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
> +}
> +
> +static struct xdp_frame *ptr_to_xdp(void *ptr)
> +{
> + return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> +}
> +
>  /* Converting between virtqueue no. and kernel tx/rx queue no.
>   * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
>   */
> @@ -462,7 +479,8 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>  
>   sg_init_one(sq->sg, xdpf->data, xdpf->len);
>  
> - err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC);
> + err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
> +GFP_ATOMIC);
>   if (unlikely(err))
>   return -ENOSPC; /* Caller handle free/refcnt */
>  
> @@ -482,13 +500,13 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
>   struct receive_queue *rq = vi->rq;
> - struct xdp_frame *xdpf_sent;
>   struct bpf_prog *xdp_prog;
>   struct send_queue *sq;
>   unsigned int len;
>   int drops = 0;
>   int kicks = 0;
>   int ret, err;
> + void *ptr;
>   int i;
>  
>   /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
> @@ -507,8 +525,12 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   }
>  
>   /* Free up any pending old buffers before queueing new ones. */
> - while ((xdpf_sent = virtqueue_get_buf(sq->vq, )) != NULL)
> - xdp_return_frame(xdpf_sent);
> + while ((ptr = virtqueue_get_buf(sq->vq, )) != NULL) {
> + if (likely(is_xdp_frame(ptr)))
> + xdp_return_frame(ptr_to_xdp(ptr));
> + else
> + napi_consume_skb(ptr, false);
> + }
>  
>   for (i = 0; i < n; i++) {
>   struct xdp_frame *xdpf = frames[i];
> @@ -1329,18 +1351,26 @@ static int virtnet_receive(struct receive_queue *rq, 
> int budget,
>  
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
> - struct sk_buff *skb;
>   unsigned int len;
>   unsigned int packets = 0;
>   unsigned int bytes = 0;
> + void *ptr;
>  
> - while ((skb = virtqueue_get_buf(sq->vq, )) != NULL) {
> - pr_debug("Sent skb %p\n", skb);
> + while ((ptr = virtqueue_get_buf(sq->vq, )) != NULL) {
> + if (likely(!is_xdp_frame(ptr))) {
> + struct sk_buff *skb = ptr;
>  
> - bytes += skb->len;
> - packets++;
> + pr_debug("Sent skb %p\n", skb);
>  
> - napi_consume_skb(skb, in_napi);
> + bytes += skb->len;
> + napi_consume_skb(skb, in_napi);
> + } else {
> + struct xdp_frame *frame = ptr_to_xdp(ptr);
> +
> + bytes += frame->len;
> + xdp_return_frame(frame);
> + }
> + 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> 
> Fixing address of powerpc mailing list.
> 
> Thiago Jung Bauermann  writes:
> 
> > Hello,
> >
> > With Christoph's rework of the DMA API that recently landed, the patch
> > below is the only change needed in virtio to make it work in a POWER
> > secure guest under the ultravisor.
> >
> > The other change we need (making sure the device's dma_map_ops is NULL
> > so that the dma-direct/swiotlb code is used) can be made in
> > powerpc-specific code.
> >
> > Of course, I also have patches (soon to be posted as RFC) which hook up
> >  to the powerpc secure guest support code.
> >
> > What do you think?
> >
> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> > From: Thiago Jung Bauermann 
> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
> >
> > The host can't access the guest memory when it's encrypted, so using
> > regular memory pages for the ring isn't an option. Go through the DMA API.
> >
> > Signed-off-by: Thiago Jung Bauermann 

Well I think this will come back to bite us (witness xen which is now
reworking precisely this path - but at least they aren't to blame, xen
came before ACCESS_PLATFORM).

I also still think the right thing would have been to set
ACCESS_PLATFORM for all systems where device can't access all memory.

But I also think I don't have the energy to argue about power secure
guest anymore.  So be it for power secure guest since the involved
engineers disagree with me.  Hey I've been wrong in the past ;).

But the name "sev_active" makes me scared because at least AMD guys who
were doing the sensible thing and setting ACCESS_PLATFORM (unless I'm
wrong? I reemember distinctly that's so) will likely be affected too.
We don't want that.

So let's find a way to make sure it's just power secure guest for now
pls.

I also think we should add a dma_api near features under virtio_device
such that these hacks can move off data path.

By the way could you please respond about virtio-iommu and
why there's no support for ACCESS_PLATFORM on power?

I have Cc'd you on these discussions.


Thanks!


> > ---
> >  drivers/virtio/virtio_ring.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cd7e755484e3..321a27075380 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -259,8 +259,11 @@ static bool vring_use_dma_api(struct virtio_device 
> > *vdev)
> >  * not work without an even larger kludge.  Instead, enable
> >  * the DMA API if we're a Xen guest, which at least allows
> >  * all of the sensible Xen configurations to work correctly.
> > +*
> > +* Also, if guest memory is encrypted the host can't access
> > +* it directly. In this case, we'll need to use the DMA API.
> >  */
> > -   if (xen_domain())
> > +   if (xen_domain() || sev_active())
> > return true;
> >
> > return false;
> 
> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 0/7] Add virtio-iommu driver

2019-01-29 Thread Michael S. Tsirkin
On Mon, Jan 21, 2019 at 11:29:05AM +, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 18/01/2019 15:51, Michael S. Tsirkin wrote:
> > 
> > On Tue, Jan 15, 2019 at 12:19:52PM +, Jean-Philippe Brucker wrote:
> >> Implement the virtio-iommu driver, following specification v0.9 [1].
> >>
> >> This is a simple rebase onto Linux v5.0-rc2. We now use the
> >> dev_iommu_fwspec_get() helper introduced in v5.0 instead of accessing
> >> dev->iommu_fwspec, but there aren't any functional change from v6 [2].
> >>
> >> Our current goal for virtio-iommu is to get a paravirtual IOMMU working
> >> on Arm, and enable device assignment to guest userspace. In this
> >> use-case the mappings are static, and don't require optimal performance,
> >> so this series tries to keep things simple. However there is plenty more
> >> to do for features and optimizations, and having this base in v5.1 would
> >> be good. Given that most of the changes are to drivers/iommu, I believe
> >> the driver and future changes should go via the IOMMU tree.
> >>
> >> You can find Linux driver and kvmtool device on v0.9.2 branches [3],
> >> module and x86 support on virtio-iommu/devel. Also tested with Eric's
> >> QEMU device [4]. Please note that the series depends on Robin's
> >> probe-deferral fix [5], which will hopefully land in v5.0.
> >>
> >> [1] Virtio-iommu specification v0.9, sources and pdf
> >> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> >> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> >>
> >> [2] [PATCH v6 0/7] Add virtio-iommu driver
> >> 
> >> https://lists.linuxfoundation.org/pipermail/iommu/2018-December/032127.html
> >>
> >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.2
> >> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9.2
> >>
> >> [4] [RFC v9 00/17] VIRTIO-IOMMU device
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> >>
> >> [5] [PATCH] iommu/of: Fix probe-deferral
> >> https://www.spinics.net/lists/arm-kernel/msg698371.html
> > 
> > Thanks for the work!
> > So really my only issue with this is that there's no
> > way for the IOMMU to describe the devices that it
> > covers.
> > 
> > As a result that is then done in a platform-specific way.
> > 
> > And this means that for example it does not solve the problem that e.g.
> > some power people have in that their platform simply does not have a way
> > to specify which devices are covered by the IOMMU.
> 
> Isn't power using device tree? I haven't looked much at power because I
> was told a while ago that they already paravirtualize their IOMMU and
> don't need virtio-iommu, except perhaps for some legacy platforms. Or
> something along those lines. But I would certainly be interested in
> enabling the IOMMU for more architectures.

I have CC'd the relevant ppc developers, let's see what do they think.


> As for the enumeration problem, I still don't think we can get much
> better than DT and ACPI as solutions (and IMO they are necessary to make
> this device portable). But I believe that getting DT and ACPI support is
> just a one-off inconvenience. That is, once the required bindings are
> accepted, any future extension can then be done at the virtio level with
> feature bits and probe requests, without having to update ACPI or DT.
> 
> Thanks,
> Jean
> 
> > Solving that problem would make me much more excited about
> > this device.
> > 
> > On the other hand I can see that while there have been some
> > developments most of the code has been stable for quite a while now.
> > 
> > So what I am trying to do right about now, is making a small module that
> > loads early and pokes at the IOMMU sufficiently to get the data about
> > which devices use the IOMMU out of it using standard virtio config
> > space.  IIUC it's claimed to be impossible without messy changes to the
> > boot sequence.
> > 
> > If I succeed at least on some platforms I'll ask that this design is
> > worked into this device, minimizing info that goes through DT/ACPI.  If
> > I see I can't make it in time to meet the next merge window, I plan
> > merging the existing patches using DT (barring surprises).
> > 
> > As I only have a very small amount of time to spend on this attempt, If
> > someone else wants to try doing that in parallel, that would be great!
> > 
> > 
> >> Jean-Philippe Brucker (7):
> >>   dt-bindings: virtio-mmio: Add IOMMU description
> >>   dt-bindings: virtio: Add virtio-pci-iommu node
> >>   of: Allow the iommu-map property to omit untranslated devices
> >>   PCI: OF: Initialize dev->fwnode appropriately
> >>   iommu: Add virtio-iommu driver
> >>   iommu/virtio: Add probe request
> >>   iommu/virtio: Add event queue
> >>
> >>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
> >>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
> >>  MAINTAINERS   |7 +
> >>  drivers/iommu/Kconfig |   11 +
> >>  drivers/iommu/Makefile   

Re: [PATCH 3/5] dma: Introduce dma_max_mapping_size()

2019-01-29 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 09:43:40AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The function returns the maximum size that can be mapped
> using DMA-API functions. The patch also adds the
> implementation for direct DMA and a new dma_map_ops pointer
> so that other implementations can expose their limit.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 

Looks good:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/5] virtio: Introduce virtio_max_dma_size()

2019-01-29 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 09:43:41AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This function returns the maximum segment size for a single
> dma transaction of a virtio device. The possible limit comes
> from the SWIOTLB implementation in the Linux kernel, that
> has an upper limit of (currently) 256kb of contiguous
> memory it can map. Other DMA-API implementations might also
> have limits.
> 
> Use the new dma_max_mapping_size() function to determine the
> maximum mapping size when DMA-API is in use for virtio.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] virtio-blk: Consider virtio_max_dma_size() for maximum segment size

2019-01-29 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 09:43:42AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Segments can't be larger than the maximum DMA mapping size
> supported on the platform. Take that into account when
> setting the maximum segment size for a block device.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 

Looks good:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] swiotlb: Add is_swiotlb_active() function

2019-01-29 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 09:43:39AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This function will be used from dma_direct code to determine
> the maximum segment size of a dma mapping.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] swiotlb: Introduce swiotlb_max_mapping_size()

2019-01-29 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 09:43:38AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The function returns the maximum size that can be remapped
> by the SWIOTLB implementation. This function will be later
> exposed to users through the DMA-API.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 

Looks good:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: Introduce extended RSC feature

2019-01-29 Thread Michael S. Tsirkin
On Tue, Jan 29, 2019 at 02:52:36PM +0200, Yuri Benditovich wrote:
> VIRTIO_NET_F_RSC_EXT feature bit indicates that the device
> is able to provide extended RSC information. When the feature
> is active and 'gso_type' field in received packet is not GSO_NONE,
> the device reports number of coalesced packets in 'csum_start'
> field and number of duplicated acks in 'csum_offset' field
> and sets VIRTIO_NET_HDR_F_RSC_INFO in 'flags' field.
> 
> Signed-off-by: Yuri Benditovich 
> ---
>  include/uapi/linux/virtio_net.h | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index a3715a3224c1..93c71d714475 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -56,7 +56,7 @@
>  #define VIRTIO_NET_F_MQ  22  /* Device supports Receive Flow
>* Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
> -
> +#define VIRTIO_NET_F_RSC_EXT   61/* Provides extended RSC info */
>  #define VIRTIO_NET_F_STANDBY   62/* Act as standby for another device
>* with the same MAC.
>*/
> @@ -104,6 +104,7 @@ struct virtio_net_config {
>  struct virtio_net_hdr_v1 {
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM  1   /* Use csum_start, csum_offset 
> */
>  #define VIRTIO_NET_HDR_F_DATA_VALID  2   /* Csum is valid */
> +#define VIRTIO_NET_HDR_F_RSC_INFO4   /* rsc_ext data in csum_ fields */
>   __u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE  0   /* Not a GSO frame */
>  #define VIRTIO_NET_HDR_GSO_TCPV4 1   /* GSO frame, IPv4 TCP (TSO) */
> @@ -140,6 +141,18 @@ struct virtio_net_hdr_mrg_rxbuf {
>   struct virtio_net_hdr hdr;
>   __virtio16 num_buffers; /* Number of merged rx buffers */
>  };
> +
> +static inline __virtio16 *virtio_net_rsc_ext_num_packets(
> + struct virtio_net_hdr *hdr)
> +{
> + return >csum_start;
> +}
> +
> +static inline __virtio16 *virtio_net_rsc_ext_num_dupacks(
> + struct virtio_net_hdr *hdr)
> +{
> + return >csum_offset;
> +}
>  #endif /* ...VIRTIO_NET_NO_LEGACY */

Coding style is off here. But really I don't think these inlines are
needed here. Put them in qemu or something.

>  /*
> -- 
> 2.17.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization