Re: [PATCH 0/9] More virtio hardening

2021-10-11 Thread Michael S. Tsirkin
On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > > Hi All:
> > > > >
> > > > > This series treis to do more hardening for virito.
> > > > >
> > > > > patch 1 validates the num_queues for virio-blk device.
> > > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > > things in other transport drivers.
> > > > > patch 8-9 validate used ring length.
> > > > >
> > > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > > >
> > > > > Please review.
> > > > >
> > > > > Thanks
> > > >
> > > > So I poked at console at least, and I think I see
> > > > an issue: if interrupt handler queues a work/bh,
> > > > then it can still run while reset is in progress.
> > >
> > > Looks like a bug which is unrelated to the hardening?
> >
> > Won't preventing use after free be relevant?
> 
> Oh right.
> 
> > I frankly don't know what does hardening means then.
> > > E.g the driver
> > > should sync with work/bh before reset.
> >
> > No, there's no way to fix it ATM without extra locks and state which I
> > think we should strive to avoid or make it generic, not per-driver,
> > since sync before reset is useless, new interrupts will just arrive and
> > queue more work. And a sync after reset is too late since driver will
> > try to add buffers.
> 
> Can we do something like
> 
> 1) disable interrupt
> 2) sync bh
> 
> Or I guess this is somehow you meant in the following steps.

So that would mean a new API to disable vq interrupts.
reset will re-enable.
E.g. virtqueue_cancel_cb_before_reset()?

Then drivers can sync, then reset.
This means maintaining more state though, which I don't like.

An alternative is something like this:

static void (*virtio_flush_device)(struct virtio_device *dev);

void virtio_reset_device(struct virtio_device *dev, virtio_flush_device flush)
{
might_sleep();
if (flush) {
dev->config->disable_interrupts(dev);
flush(dev);
dev->config->reset(dev);
dev->config->enable_interrupts(dev);
} else {
dev->config->reset(dev);
}
}

I have patches wrapping all reset calls in virtio_reset_device
(without the flush parameter but that's easy to tweak).


> >
> > Maybe we can break device. Two issues with that
> > - drivers might not be ready to handle add_buf failures
> > - restore needs to unbreak then and we don't have a way to do that yet
> >
> > So .. careful reading of all device drivers and hoping we don't mess
> > things up even more ... here we come.
> 
> Yes.

The biggest issue with this trick is drivers not handling add_buf
errors, adding a failure path here risks creating memory leaks.
OTOH with e.g. bounce buffers maybe it's possible for add buf to
fail anyway?

> >
> > > >
> > > > I sent a patch to fix it for console removal specifically,
> > > > but I suspect it's not enough e.g. freeze is still broken.
> > > > And note this has been reported without any TDX things -
> > > > it's not a malicious device issue, can be triggered just
> > > > by module unload.
> > > >
> > > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > > An alternative:
> > > >
> > > > 1. adding new remove_nocb/freeze_nocb calls
> > > > 2. disabling/enabling interrupts automatically around these
> > > > 3. gradually moving devices to using these
> > > > 4. once/if all device move, removing the old callbacks
> > > >
> > > > the advantage here is that we'll be sure calls are always
> > > > paired correctly.
> > >
> > > I'm not sure I get the idea, but my feeling is that it doesn't
> > > conflict with the interrupt hardening here (or at least the same
> > > method is required e.g NO_AUTO_EN).
> > >
> > > Thanks
> >
> > Right.  It's not that it conflicts, it's that I was hoping that
> > since you are working on hardening you can take up fixing that.
> > Let me know whether you have the time. Thanks!
> 
> I can do that.
> 
> Thanks
> 
> >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > Jason Wang (9):
> > > > >   virtio-blk: validate num_queues during probe
> > > > >   virtio: add doc for validate() method
> > > > >   virtio-console: switch to use .validate()
> > > > >   virtio_console: validate max_nr_ports before trying to use it
> > > > >   virtio_config: introduce a new ready method
> > > > >   virtio_pci: harden MSI-X interrupts
> > > > >   virtio-pci: harden INTX interrupts
> > > > >   virtio_ring: fix typos in vring_desc_extra
> > > > >   virtio_ring: validate used buffer length
> > > > >
> > > > >  

Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
On Mon, Oct 11, 2021 at 03:09:09PM -0400, Michael S. Tsirkin wrote:
> The reason we have trouble is that it's not clear what does the API mean
> outside the realm of TDX.
> If we really, truly want an API that says "ioremap and it's a hardened
> driver" then I guess ioremap_hardened_driver is what you want.

Yes.  And why would be we ioremap the BIOS anyway?  It is not I/O memory
in any of the senses we generally use ioremap for.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v4 00/20] vDPA shadow virtqueue

2021-10-11 Thread Jason Wang
On Tue, Oct 12, 2021 at 11:59 AM Jason Wang  wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> >
> > When SVQ is enabled qemu offers a new vring to the device to read
> > and write into, and also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked, but at this RFC SVQ is not enabled on migration automatically.
> >
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> >
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx or indirect descriptors. These limitations will
> > be addressed in later series, but they are left out for simplicity at
> > the moment.
> >
> > SVQ needs to be enabled with QMP command:
> >
> > { "execute": "x-vhost-enable-shadow-vq",
> >"arguments": { "name": "dev0", "enable": true } }
> >
> > This series includes some patches to delete in the final version that
> > helps with its testing. The first two of the series freely implements
> > the feature to stop the device and be able to retrieve its status. It's
> > intended to be used with vp_vpda driver in a nested environment. This
> > driver also need modifications to forward the new status bit.
> >
> > Patches 2-8 prepares the SVQ and QMP command to support guest to host
> > notifications forwarding. If the SVQ is enabled with these ones
> > applied and the device supports it, that part can be tested in
> > isolation (for example, with networking), hopping through SVQ.
> >
> > Same thing is true with patches 9-13, but with device to guest
> > notifications.
> >
> > The rest of the patches implements the actual buffer forwarding.
> >
> > Comments are welcome.
>
>
> Hi Eugenio:
>
>
> It would be helpful to have a public git repo for us to ease the review.
>
> Thanks
>

Btw, we also need to measure the performance impact of the shadow virtqueue.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v4 00/20] vDPA shadow virtqueue

2021-10-11 Thread Jason Wang


在 2021/10/1 下午3:05, Eugenio Pérez 写道:

This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's virtio device operation.

When SVQ is enabled qemu offers a new vring to the device to read
and write into, and also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked, but at this RFC SVQ is not enabled on migration automatically.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

For qemu to use shadow virtqueues the guest virtio driver must not use
features like event_idx or indirect descriptors. These limitations will
be addressed in later series, but they are left out for simplicity at
the moment.

SVQ needs to be enabled with QMP command:

{ "execute": "x-vhost-enable-shadow-vq",
   "arguments": { "name": "dev0", "enable": true } }

This series includes some patches to delete in the final version that
helps with its testing. The first two of the series freely implements
the feature to stop the device and be able to retrieve its status. It's
intended to be used with vp_vpda driver in a nested environment. This
driver also need modifications to forward the new status bit.

Patches 2-8 prepares the SVQ and QMP command to support guest to host
notifications forwarding. If the SVQ is enabled with these ones
applied and the device supports it, that part can be tested in
isolation (for example, with networking), hopping through SVQ.

Same thing is true with patches 9-13, but with device to guest
notifications.

The rest of the patches implements the actual buffer forwarding.

Comments are welcome.



Hi Eugenio:


It would be helpful to have a public git repo for us to ease the review.

Thanks




TODO:
* Event, indirect, packed, and others features of virtio - Waiting for
   confirmation of the big picture.
* Use already available iova tree to track mappings.
* To sepparate buffers forwarding in its own AIO context, so we can
   throw more threads to that task and we don't need to stop the main
   event loop.
* unmap iommu memory. Now the tree can only grow from SVQ enable, but
   it should be fine as long as not a lot of memory is added to the
   guest.
* Rebase on top of latest qemu (and, hopefully, on top of multiqueue
   vdpa).
* Some assertions need to be appropiate error handling paths.
* Proper documentation.

Changes from v3 RFC:
   * Move everything to vhost-vdpa backend. A big change, this allowed
 some cleanup but more code has been added in other places.
   * More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
   * Adding vhost-vdpa devices support
   * Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
   * Use QMP instead of migration to start SVQ mode.
   * Only accepting IOMMU devices, closer behavior with target devices
 (vDPA)
   * Fix invalid masking/unmasking of vhost call fd.
   * Use of proper methods for synchronization.
   * No need to modify VirtIO device code, all of the changes are
 contained in vhost code.
   * Delete superfluous code.
   * An intermediate RFC was sent with only the notifications forwarding
 changes. It can be seen in
 https://patchew.org/QEMU/20210129205415.876290-1-epere...@redhat.com/
v1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html

Eugenio Pérez (20):
   virtio: Add VIRTIO_F_QUEUE_STATE
   virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
   virtio: Add virtio_queue_is_host_notifier_enabled
   vhost: Make vhost_virtqueue_{start,stop} public
   vhost: Add x-vhost-enable-shadow-vq qmp
   vhost: Add VhostShadowVirtqueue
   vdpa: Register vdpa devices in a list
   vhost: Route guest->host notification through shadow virtqueue
   Add vhost_svq_get_svq_call_notifier
   Add vhost_svq_set_guest_call_notifier
   vdpa: Save call_fd in vhost-vdpa
   vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
   vhost: Route host->guest notification through shadow virtqueue
   virtio: Add vhost_shadow_vq_get_vring_addr
   vdpa: Save host and guest features
   vhost: Add vhost_svq_valid_device_features to shadow vq
   vhost: Shadow virtqueue buffers forwarding
   vhost: Add VhostIOVATree
   vhost: Use a tree to store memory mappings
   vdpa: Add custom IOTLB translations to SVQ

Eugenio Pérez (20):
   virtio: Add 

Re: [PATCH v2 3/3] vdpa: Check for iova range at mappings changes

2021-10-11 Thread Jason Wang
On Tue, Oct 5, 2021 at 9:49 PM Eugenio Pérez  wrote:
>
> Check vdpa device range before updating memory regions so we don't add
> any outside of it, and report the invalid change if any.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  include/hw/virtio/vhost-vdpa.h |  2 +
>  hw/virtio/vhost-vdpa.c | 68 ++
>  hw/virtio/trace-events |  1 +
>  3 files changed, 55 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index a8963da2d9..c288cf7ecb 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -13,6 +13,7 @@
>  #define HW_VIRTIO_VHOST_VDPA_H
>
>  #include "hw/virtio/virtio.h"
> +#include "standard-headers/linux/vhost_types.h"
>
>  typedef struct VhostVDPAHostNotifier {
>  MemoryRegion mr;
> @@ -24,6 +25,7 @@ typedef struct vhost_vdpa {
>  uint32_t msg_type;
>  bool iotlb_batch_begin_sent;
>  MemoryListener listener;
> +struct vhost_vdpa_iova_range iova_range;
>  struct vhost_dev *dev;
>  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index be7c63b4ba..6654287050 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -37,20 +37,34 @@ static Int128 vhost_vdpa_section_end(const 
> MemoryRegionSection *section)
>  return llend;
>  }
>
> -static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
> -{
> -return (!memory_region_is_ram(section->mr) &&
> -!memory_region_is_iommu(section->mr)) ||
> -memory_region_is_protected(section->mr) ||
> -   /* vhost-vDPA doesn't allow MMIO to be mapped  */
> -memory_region_is_ram_device(section->mr) ||
> -   /*
> -* Sizing an enabled 64-bit BAR can cause spurious mappings to
> -* addresses in the upper part of the 64-bit address space.  These
> -* are never accessed by the CPU and beyond the address width of
> -* some IOMMU hardware.  TODO: VDPA should tell us the IOMMU 
> width.
> -*/
> -   section->offset_within_address_space & (1ULL << 63);
> +static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
> +uint64_t iova_min,
> +uint64_t iova_max)
> +{
> +Int128 llend;
> +
> +if ((!memory_region_is_ram(section->mr) &&
> + !memory_region_is_iommu(section->mr)) ||
> +memory_region_is_protected(section->mr) ||
> +/* vhost-vDPA doesn't allow MMIO to be mapped  */
> +memory_region_is_ram_device(section->mr)) {
> +return true;
> +}
> +
> +if (section->offset_within_address_space < iova_min) {
> +error_report("RAM section out of device range (min=%lu, addr=%lu)",
> + iova_min, section->offset_within_address_space);
> +return true;
> +}
> +
> +llend = vhost_vdpa_section_end(section);
> +if (int128_gt(llend, int128_make64(iova_max))) {
> +error_report("RAM section out of device range (max=%lu, end 
> addr=%lu)",
> + iova_max, int128_get64(llend));
> +return true;
> +}
> +
> +return false;
>  }
>
>  static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
> @@ -162,7 +176,8 @@ static void vhost_vdpa_listener_region_add(MemoryListener 
> *listener,
>  void *vaddr;
>  int ret;
>
> -if (vhost_vdpa_listener_skipped_section(section)) {
> +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> +v->iova_range.last)) {
>  return;
>  }
>
> @@ -220,7 +235,8 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
> *listener,
>  Int128 llend, llsize;
>  int ret;
>
> -if (vhost_vdpa_listener_skipped_section(section)) {
> +if (vhost_vdpa_listener_skipped_section(section, v->iova_range.first,
> +v->iova_range.last)) {
>  return;
>  }
>
> @@ -288,9 +304,24 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, 
> uint8_t status)
>  vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, );
>  }
>
> +static int vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> +{
> +int ret;
> +
> +ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE, >iova_range);
> +if (ret != 0) {
> +return ret;
> +}

I think we need a fallback for the kernel that does not support
VHOST_VDPA_GET_IOVA_RANGE?

Thanks

> +
> +trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> +v->iova_range.last);
> +return ret;
> +}
> +
>  static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>  {
>  struct vhost_vdpa *v;
> +int r;
>  assert(dev->vhost_ops->backend_type == 

Re: [PATCH v2 2/3] vdpa: Add vhost_vdpa_section_end

2021-10-11 Thread Jason Wang
On Tue, Oct 5, 2021 at 9:49 PM Eugenio Pérez  wrote:
>
> Abstract this operation, that will be reused when validating the region
> against the iova range that the device supports.
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-vdpa.c | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index ea1aa71ad8..be7c63b4ba 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -24,6 +24,19 @@
>  #include "trace.h"
>  #include "qemu-common.h"
>
> +/*
> + * Return one past the end of the end of section. Be careful with uint64_t
> + * conversions!
> + */
> +static Int128 vhost_vdpa_section_end(const MemoryRegionSection *section)
> +{
> +Int128 llend = int128_make64(section->offset_within_address_space);
> +llend = int128_add(llend, section->size);
> +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> +
> +return llend;
> +}
> +
>  static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
>  {
>  return (!memory_region_is_ram(section->mr) &&
> @@ -160,10 +173,7 @@ static void 
> vhost_vdpa_listener_region_add(MemoryListener *listener,
>  }
>
>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> -llend = int128_make64(section->offset_within_address_space);
> -llend = int128_add(llend, section->size);
> -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> -
> +llend = vhost_vdpa_section_end(section);
>  if (int128_ge(int128_make64(iova), llend)) {
>  return;
>  }
> @@ -221,9 +231,7 @@ static void vhost_vdpa_listener_region_del(MemoryListener 
> *listener,
>  }
>
>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> -llend = int128_make64(section->offset_within_address_space);
> -llend = int128_add(llend, section->size);
> -llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
> +llend = vhost_vdpa_section_end(section);
>
>  trace_vhost_vdpa_listener_region_del(v, iova, int128_get64(llend));
>
> --
> 2.27.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/3] vdpa: Skip protected ram IOMMU mappings

2021-10-11 Thread Jason Wang
On Tue, Oct 5, 2021 at 9:49 PM Eugenio Pérez  wrote:
>
> Following the logic of commit 56918a126ae ("memory: Add RAM_PROTECTED
> flag to skip IOMMU mappings") with VFIO, skip memory sections
> inaccessible via normal mechanisms, including DMA.
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

> ---
>  hw/virtio/vhost-vdpa.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 47d7a5a23d..ea1aa71ad8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -28,6 +28,7 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section)
>  {
>  return (!memory_region_is_ram(section->mr) &&
>  !memory_region_is_iommu(section->mr)) ||
> +memory_region_is_protected(section->mr) ||
> /* vhost-vDPA doesn't allow MMIO to be mapped  */
>  memory_region_is_ram_device(section->mr) ||
> /*
> --
> 2.27.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/9] More virtio hardening

2021-10-11 Thread Jason Wang
On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin  wrote:
>
> On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> > On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > > Hi All:
> > > >
> > > > This series treis to do more hardening for virito.
> > > >
> > > > patch 1 validates the num_queues for virio-blk device.
> > > > patch 2-4 validates max_nr_ports for virito-console device.
> > > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > > interrupt handler is tiggered. If this makes sense we can do similar
> > > > things in other transport drivers.
> > > > patch 8-9 validate used ring length.
> > > >
> > > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > > >
> > > > Please review.
> > > >
> > > > Thanks
> > >
> > > So I poked at console at least, and I think I see
> > > an issue: if interrupt handler queues a work/bh,
> > > then it can still run while reset is in progress.
> >
> > Looks like a bug which is unrelated to the hardening?
>
> Won't preventing use after free be relevant?

Oh right.

> I frankly don't know what does hardening means then.
> > E.g the driver
> > should sync with work/bh before reset.
>
> No, there's no way to fix it ATM without extra locks and state which I
> think we should strive to avoid or make it generic, not per-driver,
> since sync before reset is useless, new interrupts will just arrive and
> queue more work. And a sync after reset is too late since driver will
> try to add buffers.

Can we do something like

1) disable interrupt
2) sync bh

Or I guess this is somehow you meant in the following steps.

>
> Maybe we can break device. Two issues with that
> - drivers might not be ready to handle add_buf failures
> - restore needs to unbreak then and we don't have a way to do that yet
>
> So .. careful reading of all device drivers and hoping we don't mess
> things up even more ... here we come.

Yes.

>
> > >
> > > I sent a patch to fix it for console removal specifically,
> > > but I suspect it's not enough e.g. freeze is still broken.
> > > And note this has been reported without any TDX things -
> > > it's not a malicious device issue, can be triggered just
> > > by module unload.
> > >
> > > I am vaguely thinking about new APIs to disable/enable callbacks.
> > > An alternative:
> > >
> > > 1. adding new remove_nocb/freeze_nocb calls
> > > 2. disabling/enabling interrupts automatically around these
> > > 3. gradually moving devices to using these
> > > 4. once/if all device move, removing the old callbacks
> > >
> > > the advantage here is that we'll be sure calls are always
> > > paired correctly.
> >
> > I'm not sure I get the idea, but my feeling is that it doesn't
> > conflict with the interrupt hardening here (or at least the same
> > method is required e.g NO_AUTO_EN).
> >
> > Thanks
>
> Right.  It's not that it conflicts, it's that I was hoping that
> since you are working on hardening you can take up fixing that.
> Let me know whether you have the time. Thanks!

I can do that.

Thanks

>
> > >
> > >
> > >
> > >
> > >
> > > > Jason Wang (9):
> > > >   virtio-blk: validate num_queues during probe
> > > >   virtio: add doc for validate() method
> > > >   virtio-console: switch to use .validate()
> > > >   virtio_console: validate max_nr_ports before trying to use it
> > > >   virtio_config: introduce a new ready method
> > > >   virtio_pci: harden MSI-X interrupts
> > > >   virtio-pci: harden INTX interrupts
> > > >   virtio_ring: fix typos in vring_desc_extra
> > > >   virtio_ring: validate used buffer length
> > > >
> > > >  drivers/block/virtio_blk.c |  3 +-
> > > >  drivers/char/virtio_console.c  | 51 +-
> > > >  drivers/virtio/virtio_pci_common.c | 43 +
> > > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > > >  drivers/virtio/virtio_ring.c   | 27 ++--
> > > >  include/linux/virtio.h |  1 +
> > > >  include/linux/virtio_config.h  |  6 
> > > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > > >
> > > > --
> > > > 2.25.1
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Michael S. Tsirkin
On Mon, Oct 11, 2021 at 10:23:00AM -0700, Andi Kleen wrote:
> 
> On 10/11/2021 12:58 AM, Christoph Hellwig wrote:
> > Just as last time:  This does not make any sense.  ioremap is shared
> > by definition.
> 
> It's not necessarily shared with the host for confidential computing: for
> example BIOS mappings definitely should not be shared, but they're using
> ioremap today.

That just needs to be fixed.

> But if you have a better term please propose something. I tried to clarify
> it with "shared_host", but I don't know a better term.
> 
> 
> -Andi
> 


The reason we have trouble is that it's not clear what does the API mean
outside the realm of TDX.
If we really, truly want an API that says "ioremap and it's a hardened
driver" then I guess ioremap_hardened_driver is what you want.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

2021-10-11 Thread Michael S. Tsirkin
On Mon, Oct 11, 2021 at 10:35:18AM -0700, Andi Kleen wrote:
> 
> > Presumably bios code is in arch/x86 and drivers/acpi, right?
> > Up to 200 calls the majority of which is likely private ...
> 
> Yes.
> 
> > I don't have better ideas but the current setup will just
> > result in people making their guests vulnerable whenever they
> > want to allow device pass-through.
> 
> 
> Yes that's true. For current TDX our target is virtual devices only. But if
> pass through usage will be really wide spread we may need to revisit.
> 
> 
> -Andi

I mean ... it's already wide spread. If we support it with TDX
it will be used with TDX. If we don't then I guess it won't,
exposing this kind of limitation in a userspace visible way isn't great
though. I guess it boils down to the fact that
ioremap_host_shared is just not a great interface, users simply
have no idea whether a given driver uses ioremap.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Michael S. Tsirkin
On Mon, Oct 11, 2021 at 10:32:23AM -0700, Andi Kleen wrote:
> 
> > Because it does not end with I/O operations, that's a trivial example.
> > module unloading is famous for being racy: I just re-read that part of
> > virtio drivers and sure enough we have bugs there, this is after
> > they have presumably been audited, so a TDX guest is better off
> > just disabling hot-unplug completely, and hotplug isn't far behind.
> 
> These all shouldn't matter for a confidential guest. The only way it can be
> attacked is through IO, everything else is protected by hardware.
> 
> 
> Also it would all require doing something at the guest level, which we
> assume is not malicious.
> 
> 
> > Malicious filesystems can exploit many linux systems unless
> > you take pains to limit what is mounted and how.
> 
> That's expected to be handled by authenticated dmcrypt and similar.
> Hardening at this level has been done for many years.

It's possible to do it like this, sure. But that's not the
only configuration, userspace needs to be smart about setting things up.
Which is my point really.

> 
> > Networking devices tend to get into the default namespaces and can
> > do more or less whatever CAP_NET_ADMIN can.
> > Etc.
> 
> 
> Networking should be already hardened, otherwise you would have much worse
> problems today.

Same thing. NFS is pretty common, you are saying don't do it then. Fair
enough but again, arbitrary configs just aren't going to be secure.

> 
> 
> > hange in your subsystem here.
> > Well I commented on the API patch, not the virtio patch.
> > If it's a way for a driver to say "I am hardened
> > and audited" then I guess it should at least say so.
> 
> 
> This is handled by the central allow list. We intentionally didn't want each
> driver to declare itself, but have a central list where changes will get
> more scrutiny than random driver code.

Makes sense. Additionally, distros can tweak that to their heart's
content, selecting the functionality/security balance that makes sense
for them.

> But then there are the additional opt-ins for the low level firewall. These
> are in the API. I don't see how it could be done at the driver level, unless
> you want to pass in a struct device everywhere?

I am just saying don't do it then. Don't build drivers that distro does
not want to support into kernel. And don't load them when they are
modules.

> > > > How about creating a defconfig that makes sense for TDX then?
> > > TDX can be used in many different ways, I don't think a defconfig is
> > > practical.
> > > 
> > > In theory you could do some Kconfig dependency (at the pain point of 
> > > having
> > > separate kernel binariees), but why not just do it at run time then if you
> > > maintain the list anyways. That's much easier and saner for everyone. In 
> > > the
> > > past we usually always ended up with runtime mechanism for similar things
> > > anyways.
> > > 
> > > Also it turns out that the filter mechanisms are needed for some arch
> > > drivers which are not even configurable, so alone it's probably not 
> > > enough,
> > 
> > I guess they aren't really needed though right, or you won't try to
> > filter them?
> 
> We're addressing most of them with the device filter for platform drivers.
> But since we cannot stop them doing ioremap IO in their init code they also
> need the low level firewall.
> 
> Some others that cannot be addressed have explicit disables.
> 
> 
> > So make them configurable?
> 
> Why not just fix the runtime? It's much saner for everyone. Proposing to do
> things at build time sounds like we're in Linux 0.99 days.
> 
> -Andi

Um. Tweaking driver code is not just build time, it's development time.
At least with kconfig you don't need to patch your kernel.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

2021-10-11 Thread Andi Kleen




Presumably bios code is in arch/x86 and drivers/acpi, right?
Up to 200 calls the majority of which is likely private ...


Yes.


I don't have better ideas but the current setup will just
result in people making their guests vulnerable whenever they
want to allow device pass-through.



Yes that's true. For current TDX our target is virtual devices only. But 
if pass through usage will be really wide spread we may need to revisit.



-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Andi Kleen




Because it does not end with I/O operations, that's a trivial example.
module unloading is famous for being racy: I just re-read that part of
virtio drivers and sure enough we have bugs there, this is after
they have presumably been audited, so a TDX guest is better off
just disabling hot-unplug completely, and hotplug isn't far behind.


These all shouldn't matter for a confidential guest. The only way it can 
be attacked is through IO, everything else is protected by hardware.



Also it would all require doing something at the guest level, which we 
assume is not malicious.




Malicious filesystems can exploit many linux systems unless
you take pains to limit what is mounted and how.


That's expected to be handled by authenticated dmcrypt and similar. 
Hardening at this level has been done for many years.




Networking devices tend to get into the default namespaces and can
do more or less whatever CAP_NET_ADMIN can.
Etc.



Networking should be already hardened, otherwise you would have much 
worse problems today.





hange in your subsystem here.
Well I commented on the API patch, not the virtio patch.
If it's a way for a driver to say "I am hardened
and audited" then I guess it should at least say so.



This is handled by the central allow list. We intentionally didn't want 
each driver to declare itself, but have a central list where changes 
will get more scrutiny than random driver code.


But then there are the additional opt-ins for the low level firewall. 
These are in the API. I don't see how it could be done at the driver 
level, unless you want to pass in a struct device everywhere?



How about creating a defconfig that makes sense for TDX then?

TDX can be used in many different ways, I don't think a defconfig is
practical.

In theory you could do some Kconfig dependency (at the pain point of having
separate kernel binariees), but why not just do it at run time then if you
maintain the list anyways. That's much easier and saner for everyone. In the
past we usually always ended up with runtime mechanism for similar things
anyways.

Also it turns out that the filter mechanisms are needed for some arch
drivers which are not even configurable, so alone it's probably not enough,


I guess they aren't really needed though right, or you won't try to
filter them?


We're addressing most of them with the device filter for platform 
drivers. But since we cannot stop them doing ioremap IO in their init 
code they also need the low level firewall.


Some others that cannot be addressed have explicit disables.



So make them configurable?


Why not just fix the runtime? It's much saner for everyone. Proposing to 
do things at build time sounds like we're in Linux 0.99 days.


-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Andi Kleen



On 10/11/2021 12:58 AM, Christoph Hellwig wrote:

Just as last time:  This does not make any sense.  ioremap is shared
by definition.


It's not necessarily shared with the host for confidential computing: 
for example BIOS mappings definitely should not be shared, but they're 
using ioremap today.


But if you have a better term please propose something. I tried to 
clarify it with "shared_host", but I don't know a better term.



-Andi


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/9] More virtio hardening

2021-10-11 Thread Michael S. Tsirkin
On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason Wang wrote:
> On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > > Hi All:
> > >
> > > This series treis to do more hardening for virito.
> > >
> > > patch 1 validates the num_queues for virio-blk device.
> > > patch 2-4 validates max_nr_ports for virito-console device.
> > > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > > interrupt handler is tiggered. If this makes sense we can do similar
> > > things in other transport drivers.
> > > patch 8-9 validate used ring length.
> > >
> > > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> > >
> > > Please review.
> > >
> > > Thanks
> >
> > So I poked at console at least, and I think I see
> > an issue: if interrupt handler queues a work/bh,
> > then it can still run while reset is in progress.
> 
> Looks like a bug which is unrelated to the hardening?

Won't preventing use after free be relevant?
I frankly don't know what does hardening means then.
> E.g the driver
> should sync with work/bh before reset.

No, there's no way to fix it ATM without extra locks and state which I
think we should strive to avoid or make it generic, not per-driver,
since sync before reset is useless, new interrupts will just arrive and
queue more work. And a sync after reset is too late since driver will
try to add buffers.

Maybe we can break device. Two issues with that
- drivers might not be ready to handle add_buf failures
- restore needs to unbreak then and we don't have a way to do that yet

So .. careful reading of all device drivers and hoping we don't mess
things up even more ... here we come.

> >
> > I sent a patch to fix it for console removal specifically,
> > but I suspect it's not enough e.g. freeze is still broken.
> > And note this has been reported without any TDX things -
> > it's not a malicious device issue, can be triggered just
> > by module unload.
> >
> > I am vaguely thinking about new APIs to disable/enable callbacks.
> > An alternative:
> >
> > 1. adding new remove_nocb/freeze_nocb calls
> > 2. disabling/enabling interrupts automatically around these
> > 3. gradually moving devices to using these
> > 4. once/if all device move, removing the old callbacks
> >
> > the advantage here is that we'll be sure calls are always
> > paired correctly.
> 
> I'm not sure I get the idea, but my feeling is that it doesn't
> conflict with the interrupt hardening here (or at least the same
> method is required e.g NO_AUTO_EN).
> 
> Thanks

Right.  It's not that it conflicts, it's that I was hoping that
since you are working on hardening you can take up fixing that.
Let me know whether you have the time. Thanks!

> >
> >
> >
> >
> >
> > > Jason Wang (9):
> > >   virtio-blk: validate num_queues during probe
> > >   virtio: add doc for validate() method
> > >   virtio-console: switch to use .validate()
> > >   virtio_console: validate max_nr_ports before trying to use it
> > >   virtio_config: introduce a new ready method
> > >   virtio_pci: harden MSI-X interrupts
> > >   virtio-pci: harden INTX interrupts
> > >   virtio_ring: fix typos in vring_desc_extra
> > >   virtio_ring: validate used buffer length
> > >
> > >  drivers/block/virtio_blk.c |  3 +-
> > >  drivers/char/virtio_console.c  | 51 +-
> > >  drivers/virtio/virtio_pci_common.c | 43 +
> > >  drivers/virtio/virtio_pci_common.h |  7 ++--
> > >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> > >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> > >  drivers/virtio/virtio_ring.c   | 27 ++--
> > >  include/linux/virtio.h |  1 +
> > >  include/linux/virtio_config.h  |  6 
> > >  9 files changed, 118 insertions(+), 31 deletions(-)
> > >
> > > --
> > > 2.25.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

2021-10-11 Thread Michael S. Tsirkin
On Sun, Oct 10, 2021 at 07:39:55PM -0700, Andi Kleen wrote:
> 
> > The connection is quite unfortunate IMHO.
> > Can't there be an option
> > that unbreaks drivers *without* opening up security holes by
> > making BIOS shared?
> 
> That would require new low level APIs that distinguish both cases, and a
> tree sweep.
> 
> 
> -Andi

Presumably bios code is in arch/x86 and drivers/acpi, right?
Up to 200 calls the majority of which is likely private ...

I don't have better ideas but the current setup will just
result in people making their guests vulnerable whenever they
want to allow device pass-through.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Michael S. Tsirkin
On Sun, Oct 10, 2021 at 03:22:39PM -0700, Andi Kleen wrote:
> 
> > To which Andi replied
> > One problem with removing the ioremap opt-in is that
> > it's still possible for drivers to get at devices without going through 
> > probe.
> > 
> > To which Greg replied:
> > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/
> > If there are in-kernel PCI drivers that do not do this, they need to be
> > fixed today.
> > 
> > Can you guys resolve the differences here?
> 
> 
> I addressed this in my other mail, but we may need more discussion.

Hopefully Greg will reply to that one.

> 
> > 
> > And once they are resolved, mention this in the commit log so
> > I don't get to re-read the series just to find out nothing
> > changed in this respect?
> > 
> > I frankly do not believe we are anywhere near being able to harden
> > an arbitrary kernel config against attack.
> 
> Why not? Device filter and the opt-ins together are a fairly strong
> mechanism.

Because it does not end with I/O operations, that's a trivial example.
module unloading is famous for being racy: I just re-read that part of
virtio drivers and sure enough we have bugs there, this is after
they have presumably been audited, so a TDX guest is better off
just disabling hot-unplug completely, and hotplug isn't far behind.
Malicious filesystems can exploit many linux systems unless
you take pains to limit what is mounted and how.
Networking devices tend to get into the default namespaces and can
do more or less whatever CAP_NET_ADMIN can.
Etc.
I am not saying this makes the effort worthless, I am saying userspace
better know very well what it's doing, and kernel better be
configured in a very specific way.

> And it's not that they're a lot of code or super complicated either.
> 
> You're essentially objecting to a single line change in your subsystem here.

Well I commented on the API patch, not the virtio patch.
If it's a way for a driver to say "I am hardened
and audited" then I guess it should at least say so. It has nothing
to do with host or sharing, that's an implementation detail,
and it obscures the actual limitations of the approach,
in that eventually in an ideal world all drivers would be secure
and use this API.

Yes, if that's the API that PCI gains then virtio will use it.


> > How about creating a defconfig that makes sense for TDX then?
> 
> TDX can be used in many different ways, I don't think a defconfig is
> practical.
> 
> In theory you could do some Kconfig dependency (at the pain point of having
> separate kernel binariees), but why not just do it at run time then if you
> maintain the list anyways. That's much easier and saner for everyone. In the
> past we usually always ended up with runtime mechanism for similar things
> anyways.
> 
> Also it turns out that the filter mechanisms are needed for some arch
> drivers which are not even configurable, so alone it's probably not enough,


I guess they aren't really needed though right, or you won't try to
filter them? So make them configurable?

> 
> > Anyone deviating from that better know what they are doing,
> > this API tweaking is just putting policy into the kernel  ...
> 
> Hardening drivers is kernel policy. It cannot be done anywhere else.
> 
> 
> -Andi

To clarify, the policy is which drivers to load into the kernel.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] virtio-blk: Add validation for block size in config space

2021-10-11 Thread Christoph Hellwig
On Tue, Oct 05, 2021 at 06:42:43AM -0400, Michael S. Tsirkin wrote:
> Stefan also pointed out this duplicates the logic from 
> 
> if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
> return -EINVAL;
> 
> 
> and a bunch of other places.
> 
> 
> Would it be acceptable for blk layer to validate the input
> instead of having each driver do it's own thing?
> Maybe inside blk_queue_logical_block_size?

I'm pretty sure we want down that before.  Let's just add a helper
just for that check for now as part of this series.  Actually validating
in in blk_queue_logical_block_size seems like a good idea, but returning
errors from that has a long tail.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drm/virtio: fix the missed drm_gem_object_put() in virtio_gpu_user_framebuffer_create()

2021-10-11 Thread Gerd Hoffmann
On Sat, Oct 09, 2021 at 05:09:20PM +0800, Jing Xiangfeng wrote:
> virtio_gpu_user_framebuffer_create() misses to call drm_gem_object_put()
> in an error path. Add the missed function call to fix it.

Pushed to drm-misc-next.

thanks,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 03/11] iommu/virtio: Handle incoming page faults

2021-10-11 Thread Jean-Philippe Brucker
Hi Vivek,

On Mon, Oct 11, 2021 at 01:41:15PM +0530, Vivek Gautam wrote:
> > > + list_for_each_entry(ep, >endpoints, list) {
> > > + if (ep->eid == endpoint) {
> > > + vdev = ep->vdev;
> 
> I have a question here though -
> Is endpoint-ID unique across all the endpoints available per 'viommu_dev' or
> per 'viommu_domain'?
> If it is per 'viommu_domain' then the above list is also incorrect.
> As you pointed to in the patch [1] -
> [PATCH RFC v1 02/11] iommu/virtio: Maintain a list of endpoints served
> by viommu_dev
> I am planning to add endpoint ID into a static global xarray in
> viommu_probe_device() as below:
> 
> vdev_for_each_id(i, eid, vdev) {
> ret = xa_insert(_ep_ids, eid, vdev, GFP_KERNEL);
> if (ret)
> goto err_free_dev;
> }
> 
> and replace the above list traversal as below:
> 
> xa_lock_irqsave(_ep_ids, flags);
> xa_for_each(_ep_ids, eid, vdev) {
> if (eid == endpoint) {
> ret =
> iommu_report_device_fault(vdev->dev, _evt);
> if (ret)
> dev_err(vdev->dev, "Couldn't
> handle page request\n");
> }
> }
> xa_unlock_irqrestore(_ep_ids, flags);
> 
> But using a global xarray would also be incorrect if the endpointsID are 
> global
> across 'viommu_domain'.
> 
> I need to find the correct 'viommu_endpoint' to call 
> iommu_report_device_fault()
> with the correct device.

The endpoint IDs are only unique across viommu_dev, so a global xarray
wouldn't work but one in viommu_dev would. In vdomain it doesn't work
either because we can't get to the domain from the fault handler without
first finding the endpoint

Thanks,
Jean

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-11 Thread Cornelia Huck
On Mon, Oct 11 2021, Halil Pasic  wrote:

> The virtio specification virtio-v1.1-cs01 states: "Transitional devices
> MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
> been acknowledged by the driver."  This is exactly what QEMU as of 6.1
> has done relying solely on VIRTIO_F_VERSION_1 for detecting that.
>
> However, the specification also says: "... the driver MAY read (but MUST
> NOT write) the device-specific configuration fields to check that it can
> support the device ..." before setting FEATURES_OK.
>
> In that case, any transitional device relying solely on
> VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
> legacy format.  In particular, this implies that it is in big endian
> format for big endian guests. This naturally confuses the driver which
> expects little endian in the modern mode.
>
> It is probably a good idea to amend the spec to clarify that
> VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
> is complete. Before validate callback existed, config space was only
> read after FEATURES_OK. However, we already have two regressions, so
> let's address this here as well.
>
> The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
> the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
> virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
> with DASD backing, because things simply don't work with the default.
> See Fixes tags for relevant commits.
>
> For QEMU, we can work around the issue by writing out the feature bits
> with VIRTIO_F_VERSION_1 bit set.  We (ab)use the finalize_features
> config op for this. This isn't enough to address all vhost devices since
> these do not get the features until FEATURES_OK, however it looks like
> the affected devices actually never handled the endianness for legacy
> mode correctly, so at least that's not a regression.
>
> No devices except virtio net and virtio blk seem to be affected.
>
> Long term the right thing to do is to fix the hypervisors.
>
> Cc:  #v4.11
> Signed-off-by: Halil Pasic 
> Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
> space")
> Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
> Reported-by: mark...@us.ibm.com
> Reviewed-by: Cornelia Huck 
> ---
>
> @Connie: I made some more commit message changes to accommodate Michael's
> requests. I just assumed these will work or you as well and kept your
> r-b. Please shout at me if it needs to be dropped :)

No need to shout, still looks good to me :)

> ---
>  drivers/virtio/virtio.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..236081afe9a2 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
>   driver_features_legacy = driver_features;
>   }
>  
> + /*
> +  * Some devices detect legacy solely via F_VERSION_1. Write
> +  * F_VERSION_1 to force LE config space accesses before FEATURES_OK for
> +  * these when needed.
> +  */
> + if (drv->validate && !virtio_legacy_is_little_endian()
> +   && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
> + dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
> + dev->config->finalize_features(dev);
> + }
> +
>   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>   dev->features = driver_features & device_features;
>   else
>
> base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
Just as last time:  This does not make any sense.  ioremap is shared
by definition.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/9] More virtio hardening

2021-10-11 Thread Jason Wang
On Tue, Oct 5, 2021 at 3:42 PM Michael S. Tsirkin  wrote:
>
> On Mon, Sep 13, 2021 at 01:53:44PM +0800, Jason Wang wrote:
> > Hi All:
> >
> > This series treis to do more hardening for virito.
> >
> > patch 1 validates the num_queues for virio-blk device.
> > patch 2-4 validates max_nr_ports for virito-console device.
> > patch 5-7 harden virtio-pci interrupts to make sure no exepcted
> > interrupt handler is tiggered. If this makes sense we can do similar
> > things in other transport drivers.
> > patch 8-9 validate used ring length.
> >
> > Smoking test on blk/net with packed=on/off and iommu_platform=on/off.
> >
> > Please review.
> >
> > Thanks
>
> So I poked at console at least, and I think I see
> an issue: if interrupt handler queues a work/bh,
> then it can still run while reset is in progress.

Looks like a bug which is unrelated to the hardening? E.g the driver
should sync with work/bh before reset.

>
> I sent a patch to fix it for console removal specifically,
> but I suspect it's not enough e.g. freeze is still broken.
> And note this has been reported without any TDX things -
> it's not a malicious device issue, can be triggered just
> by module unload.
>
> I am vaguely thinking about new APIs to disable/enable callbacks.
> An alternative:
>
> 1. adding new remove_nocb/freeze_nocb calls
> 2. disabling/enabling interrupts automatically around these
> 3. gradually moving devices to using these
> 4. once/if all device move, removing the old callbacks
>
> the advantage here is that we'll be sure calls are always
> paired correctly.

I'm not sure I get the idea, but my feeling is that it doesn't
conflict with the interrupt hardening here (or at least the same
method is required e.g NO_AUTO_EN).

Thanks

>
>
>
>
>
> > Jason Wang (9):
> >   virtio-blk: validate num_queues during probe
> >   virtio: add doc for validate() method
> >   virtio-console: switch to use .validate()
> >   virtio_console: validate max_nr_ports before trying to use it
> >   virtio_config: introduce a new ready method
> >   virtio_pci: harden MSI-X interrupts
> >   virtio-pci: harden INTX interrupts
> >   virtio_ring: fix typos in vring_desc_extra
> >   virtio_ring: validate used buffer length
> >
> >  drivers/block/virtio_blk.c |  3 +-
> >  drivers/char/virtio_console.c  | 51 +-
> >  drivers/virtio/virtio_pci_common.c | 43 +
> >  drivers/virtio/virtio_pci_common.h |  7 ++--
> >  drivers/virtio/virtio_pci_legacy.c |  5 +--
> >  drivers/virtio/virtio_pci_modern.c |  6 ++--
> >  drivers/virtio/virtio_ring.c   | 27 ++--
> >  include/linux/virtio.h |  1 +
> >  include/linux/virtio_config.h  |  6 
> >  9 files changed, 118 insertions(+), 31 deletions(-)
> >
> > --
> > 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization