AMD IOMMU + SME + amdgpu regression

2020-06-11 Thread Alex Xu (Hello71) via Virtualization
Hi,

amdgpu + IOMMU + SME is now working for me on 5.7, yay! But, it is 
broken on torvalds master, boo. On boot, depending on which exact commit 
I test, it either hangs immediately (with built-in driver, before 
starting initramfs), displays some errors then hangs, or spams the 
screen with many amdgpu errors.

I bisected the black screen hang to:

commit dce8d6964ebdb83bacf5e7ab8c27df151218
Author: Joerg Roedel 
Date:   Wed Apr 29 15:36:53 2020 +0200

iommu/amd: Convert to probe/release_device() call-backs

Convert the AMD IOMMU Driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
Link: https://lore.kernel.org/r/20200429133712.31431-16-j...@8bytes.org
Signed-off-by: Joerg Roedel 

Testing torvalds master (623f6dc593) with the containing merge 
(98bdc74b36) plus the DMA mapping merge (4e94d08734) reverted allows 
amdgpu + IOMMU + SME to once again work.

I think that nobody is really working on amdgpu + SME, but it would be a 
shame if it was supported and then incidentally broken by a small 
change.

I am using an ASRock B450 Pro4 with Ryzen 1600 and ASUS RX 480. I don't 
understand this code at all, but let me know what I can do to 
troubleshoot.

Thanks,
Alex.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 59/75] x86/sev-es: Handle MONITOR/MONITORX Events

2020-06-11 Thread Tom Lendacky




On 6/11/20 12:13 PM, Sean Christopherson wrote:

On Thu, Jun 11, 2020 at 03:10:45PM +0200, Joerg Roedel wrote:

On Tue, May 19, 2020 at 11:38:45PM -0700, Sean Christopherson wrote:

On Tue, Apr 28, 2020 at 05:17:09PM +0200, Joerg Roedel wrote:

+static enum es_result vc_handle_monitor(struct ghcb *ghcb,
+   struct es_em_ctxt *ctxt)
+{
+   phys_addr_t monitor_pa;
+   pgd_t *pgd;
+
+   pgd = __va(read_cr3_pa());
+   monitor_pa = vc_slow_virt_to_phys(ghcb, ctxt->regs->ax);
+
+   ghcb_set_rax(ghcb, monitor_pa);
+   ghcb_set_rcx(ghcb, ctxt->regs->cx);
+   ghcb_set_rdx(ghcb, ctxt->regs->dx);
+
+   return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MONITOR, 0, 0);


Why?  If SVM has the same behavior as VMX, the MONITOR will be disarmed on
VM-Enter, i.e. the VMM can't do anything useful for MONITOR/MWAIT.  I
assume that's the case given that KVM emulates MONITOR/MWAIT as NOPs on
SVM.


Not sure if it is disarmed on VMRUN, but the MONITOR/MWAIT instructions
are part of the GHCB spec, so they are implemented here.


Even if MONITOR/MWAIT somehow works across VMRUN I'm not sure it's something
the guest should enable by default as it leaks GPAs to the untrusted host,
with no benefit to the guest except in specific configurations.  Yeah, the
VMM can muck with page tables to trace guest to the some extent, but the
guest shouldn't be unnecessarily volunteering information to the host.

If MONITOR/MWAIT is effectively a NOP then removing this code is a no
brainer.

Can someone from AMD chime in?


I don't think there is any guarantee that MONITOR/MWAIT would work within 
a guest (I'd have to dig some more on that to get a definitive answer, but 
probably not necessary to do). As you say, if KVM emulates it as a NOP, 
there's no sense in exposing the GPA - make it a NOP in the handler. I 
just need to poke some other hypervisor vendors and hear what they have to 
say.


Thanks,
Tom





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 47/75] x86/sev-es: Add Runtime #VC Exception Handler

2020-06-11 Thread Joerg Roedel
On Thu, Jun 11, 2020 at 10:38:48AM -0700, Sean Christopherson wrote:
> Isn't it possible for the #VC handler to hit a #PF, e.g. on copy_from_user()
> in insn_fetch_from_user()?  If that happens, what prevents the #PF handler
> from hitting a #VC?  AIUI, do_vmm_communication() panics if the backup GHCB
> is already in use, e.g. #VC->#PF->#VC->NMI->#VC would be fatal.

This would be possible, if the #PF handler is able to trigger a #VC. But
I am not sure how it could, except when it has to call printk.

If #PF can trigger a #VC, then a fix is to further limit the time the
GHCB is active, and make sure there are not faults of any kind when it
actually is active. Then the only vector to care about is NMI. I will
look into that.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 47/75] x86/sev-es: Add Runtime #VC Exception Handler

2020-06-11 Thread Sean Christopherson
On Thu, Jun 11, 2020 at 01:48:31PM +0200, Joerg Roedel wrote:
> On Sat, May 23, 2020 at 09:59:24AM +0200, Borislav Petkov wrote:
> > On Tue, Apr 28, 2020 at 05:16:57PM +0200, Joerg Roedel wrote:
> > > + /*
> > > +  * Mark the per-cpu GHCBs as in-use to detect nested #VC exceptions.
> > > +  * There is no need for it to be atomic, because nothing is written to
> > > +  * the GHCB between the read and the write of ghcb_active. So it is safe
> > > +  * to use it when a nested #VC exception happens before the write.
> > > +  */
> > 
> > Looks liks that is that text... support for nested #VC exceptions.
> > I'm sure this has come up already but why do we even want to support
> > nested #VCs? IOW, can we do without them first or are they absolutely
> > necessary?
> > 
> > I'm guessing VC exceptions inside the VC handler but what are the
> > sensible use cases?
> 
> The most important use-case is #VC->NMI->#VC. When an NMI hits while the
> #VC handler uses the GHCB and the NMI handler causes another #VC, then
> the contents of the GHCB needs to be backed up, so that it doesn't
> destroy the GHCB contents of the first #VC handling path.

Isn't it possible for the #VC handler to hit a #PF, e.g. on copy_from_user()
in insn_fetch_from_user()?  If that happens, what prevents the #PF handler
from hitting a #VC?  AIUI, do_vmm_communication() panics if the backup GHCB
is already in use, e.g. #VC->#PF->#VC->NMI->#VC would be fatal.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 59/75] x86/sev-es: Handle MONITOR/MONITORX Events

2020-06-11 Thread Sean Christopherson
On Thu, Jun 11, 2020 at 03:10:45PM +0200, Joerg Roedel wrote:
> On Tue, May 19, 2020 at 11:38:45PM -0700, Sean Christopherson wrote:
> > On Tue, Apr 28, 2020 at 05:17:09PM +0200, Joerg Roedel wrote:
> > > +static enum es_result vc_handle_monitor(struct ghcb *ghcb,
> > > + struct es_em_ctxt *ctxt)
> > > +{
> > > + phys_addr_t monitor_pa;
> > > + pgd_t *pgd;
> > > +
> > > + pgd = __va(read_cr3_pa());
> > > + monitor_pa = vc_slow_virt_to_phys(ghcb, ctxt->regs->ax);
> > > +
> > > + ghcb_set_rax(ghcb, monitor_pa);
> > > + ghcb_set_rcx(ghcb, ctxt->regs->cx);
> > > + ghcb_set_rdx(ghcb, ctxt->regs->dx);
> > > +
> > > + return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MONITOR, 0, 0);
> > 
> > Why?  If SVM has the same behavior as VMX, the MONITOR will be disarmed on
> > VM-Enter, i.e. the VMM can't do anything useful for MONITOR/MWAIT.  I
> > assume that's the case given that KVM emulates MONITOR/MWAIT as NOPs on
> > SVM.
> 
> Not sure if it is disarmed on VMRUN, but the MONITOR/MWAIT instructions
> are part of the GHCB spec, so they are implemented here.

Even if MONITOR/MWAIT somehow works across VMRUN I'm not sure it's something
the guest should enable by default as it leaks GPAs to the untrusted host,
with no benefit to the guest except in specific configurations.  Yeah, the
VMM can muck with page tables to trace guest to the some extent, but the
guest shouldn't be unnecessarily volunteering information to the host.

If MONITOR/MWAIT is effectively a NOP then removing this code is a no
brainer.

Can someone from AMD chime in?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020-06-11 Thread Konrad Rzeszutek Wilk
On Thu, Jun 11, 2020 at 07:34:19AM -0400, Michael S. Tsirkin wrote:
> As testing shows no performance change, switch to that now.

What kind of testing? 100GiB? Low latency?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 59/75] x86/sev-es: Handle MONITOR/MONITORX Events

2020-06-11 Thread Joerg Roedel
On Tue, May 19, 2020 at 11:38:45PM -0700, Sean Christopherson wrote:
> On Tue, Apr 28, 2020 at 05:17:09PM +0200, Joerg Roedel wrote:
> > +static enum es_result vc_handle_monitor(struct ghcb *ghcb,
> > +   struct es_em_ctxt *ctxt)
> > +{
> > +   phys_addr_t monitor_pa;
> > +   pgd_t *pgd;
> > +
> > +   pgd = __va(read_cr3_pa());
> > +   monitor_pa = vc_slow_virt_to_phys(ghcb, ctxt->regs->ax);
> > +
> > +   ghcb_set_rax(ghcb, monitor_pa);
> > +   ghcb_set_rcx(ghcb, ctxt->regs->cx);
> > +   ghcb_set_rdx(ghcb, ctxt->regs->dx);
> > +
> > +   return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MONITOR, 0, 0);
> 
> Why?  If SVM has the same behavior as VMX, the MONITOR will be disarmed on
> VM-Enter, i.e. the VMM can't do anything useful for MONITOR/MWAIT.  I
> assume that's the case given that KVM emulates MONITOR/MWAIT as NOPs on
> SVM.

Not sure if it is disarmed on VMRUN, but the MONITOR/MWAIT instructions
are part of the GHCB spec, so they are implemented here.


Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 54/75] x86/sev-es: Handle DR7 read/write events

2020-06-11 Thread Joerg Roedel
On Mon, May 25, 2020 at 12:59:35PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2020 at 05:17:04PM +0200, Joerg Roedel wrote:
> > +   if (data)
> > +   data->dr7 = val;
> 
> Are we still returning ES_OK if !data?

Yes, it just means we ignore DR7 writes when they happen early before
runtime_data is allocated. Since the DR7 value never makes it to the
hardware register anyway, it doesn't matter.


Joerg

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 51/75] x86/sev-es: Handle MMIO events

2020-06-11 Thread Joerg Roedel
On Tue, May 19, 2020 at 11:32:02PM -0700, Sean Christopherson wrote:
> '0' is a valid physical address.  It happens to be reserved in the kernel
> thanks to L1TF, but using '0' as an error code is ugly.  Not to mention
> none of the callers actually check the result.

Right, I changed the function to better handle error cases and added
checks to the call-sites. It looks like below now:

static bool vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
 unsigned long vaddr, phys_addr_t *paddr)
{
unsigned long va = (unsigned long)vaddr;
unsigned int level;
phys_addr_t pa;
pgd_t *pgd;
pte_t *pte;

pgd = pgd_offset(current->active_mm, va);
pte = lookup_address_in_pgd(pgd, va, );
if (!pte) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.cr2= vaddr;
ctxt->fi.error_code = 0;

if (user_mode(ctxt->regs))
ctxt->fi.error_code |= X86_PF_USER;

return false;
}

pa = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
pa |= va & ~page_level_mask(level);

*paddr = pa;

return true;
}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 47/75] x86/sev-es: Add Runtime #VC Exception Handler

2020-06-11 Thread Joerg Roedel
On Sat, May 23, 2020 at 09:59:24AM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2020 at 05:16:57PM +0200, Joerg Roedel wrote:
> > +   struct ghcb backup_ghcb;
> 
> I could use some text explaining what those backups are for?

I added another comment above that line to better explain why it is
needed.


Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 47/75] x86/sev-es: Add Runtime #VC Exception Handler

2020-06-11 Thread Joerg Roedel
On Sat, May 23, 2020 at 09:59:24AM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2020 at 05:16:57PM +0200, Joerg Roedel wrote:
> > +   /*
> > +* Mark the per-cpu GHCBs as in-use to detect nested #VC exceptions.
> > +* There is no need for it to be atomic, because nothing is written to
> > +* the GHCB between the read and the write of ghcb_active. So it is safe
> > +* to use it when a nested #VC exception happens before the write.
> > +*/
> 
> Looks liks that is that text... support for nested #VC exceptions.
> I'm sure this has come up already but why do we even want to support
> nested #VCs? IOW, can we do without them first or are they absolutely
> necessary?
> 
> I'm guessing VC exceptions inside the VC handler but what are the
> sensible use cases?

The most important use-case is #VC->NMI->#VC. When an NMI hits while the
#VC handler uses the GHCB and the NMI handler causes another #VC, then
the contents of the GHCB needs to be backed up, so that it doesn't
destroy the GHCB contents of the first #VC handling path.


Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread Michael S. Tsirkin
On Thu, Jun 11, 2020 at 01:33:04PM +0200, David Hildenbrand wrote:
> 
> 
> Am 11.06.2020 um 13:18 schrieb Michael S. Tsirkin :
> 
> 
> On Thu, Jun 11, 2020 at 01:00:24PM +0200, David Hildenbrand wrote:
> 
> I'd like to have this patch in 5.8, with the initial merge of
> virtio-mem
> 
> if possible (so the user space representation of virtio-mem
> added memory
> 
> resources won't change anymore).
> 
>
> 
> So my plan is to rebase on top of -rc1 and merge this for rc2 
> then.
> 
> I don't like rebase on top of tip as the results are sometimes 
> kind
> of
> 
> random.
> 
>
> 
> Right, I just wanted to get this out early so we can discuss how to
> proceed.
> 
>
> 
> And let's add a Fixes: tag as well, this way people will remember
> to
> 
> pick this.
> 
> Makes sense?
> 
>
> 
> Yes, it's somehow a fix (for kexec). So
> 
>
> 
> Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug")
> 
>
> 
> I can respin after -rc1 with the commit id fixed as noted by Pankaj.
> 
> Just let me know what you prefer.
> 
>
> 
> Thanks!
> 
>
> Some once this commit is in Linus' tree, please ping me.
> 
> 
> It already is as mentioned, only the id was wrong.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=
> 7b7b27214bba1966772f9213cd2d8e5d67f8487f

OK I pushed this into next based on tip. Let's see what happens.


> 
>
> 
> --
> 
> Thanks,
> 
>
> 
> David / dhildenb
> 
>
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 07/11] vhost/net: avoid iov length math

2020-06-11 Thread Michael S. Tsirkin
Now that API exposes buffer length, we no longer need to
scan IOVs to figure it out.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 830fe84912a5..0b509be8d7b1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -607,11 +607,9 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 }
 
 static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
-   size_t hdr_size, int out)
+   size_t len, size_t hdr_size, int out)
 {
/* Skip header. TODO: support TSO. */
-   size_t len = iov_length(vq->iov, out);
-
iov_iter_init(iter, WRITE, vq->iov, out, len);
iov_iter_advance(iter, hdr_size);
 
@@ -640,7 +638,7 @@ static int get_tx_bufs(struct vhost_net *net,
}
 
/* Sanity check */
-   *len = init_iov_iter(vq, >msg_iter, nvq->vhost_hlen, *out);
+   *len = init_iov_iter(vq, >msg_iter, buf->out_len, nvq->vhost_hlen, 
*out);
if (*len == 0) {
vq_err(vq, "Unexpected header len for TX: %zd expected %zd\n",
*len, nvq->vhost_hlen);
@@ -1080,7 +1078,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
nlogs += *log_num;
log += *log_num;
}
-   len = iov_length(vq->iov + seg, in);
+   len = bufs[bufcount].in_len;
datalen -= len;
++bufcount;
seg += in;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020-06-11 Thread Michael S. Tsirkin
As testing shows no performance change, switch to that now.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Link: https://lore.kernel.org/r/20200401183118.8334-3-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c  |   2 +-
 drivers/vhost/vhost.c | 314 --
 drivers/vhost/vhost.h |   7 +-
 3 files changed, 61 insertions(+), 262 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 0466921f4772..7d69778aaa26 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, true, NULL);
 
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 11433d709651..dfcdb36d4227 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -304,6 +304,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 {
vq->num = 1;
vq->ndescs = 0;
+   vq->first_desc = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -372,6 +373,11 @@ static int vhost_worker(void *data)
return 0;
 }
 
+static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
+{
+   return vq->max_descs - UIO_MAXIOV;
+}
+
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
kfree(vq->descs);
@@ -394,6 +400,9 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
vq->max_descs = dev->iov_limit;
+   if (vhost_vq_num_batch_descs(vq) < 0) {
+   return -EINVAL;
+   }
vq->descs = kmalloc_array(vq->max_descs,
  sizeof(*vq->descs),
  GFP_KERNEL);
@@ -1610,6 +1619,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
vq->last_avail_idx = s.num;
/* Forget the cached index value. */
vq->avail_idx = vq->last_avail_idx;
+   vq->ndescs = vq->first_desc = 0;
break;
case VHOST_GET_VRING_BASE:
s.index = idx;
@@ -2078,253 +2088,6 @@ static unsigned next_desc(struct vhost_virtqueue *vq, 
struct vring_desc *desc)
return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-   struct iovec iov[], unsigned int iov_size,
-   unsigned int *out_num, unsigned int *in_num,
-   struct vhost_log *log, unsigned int *log_num,
-   struct vring_desc *indirect)
-{
-   struct vring_desc desc;
-   unsigned int i = 0, count, found = 0;
-   u32 len = vhost32_to_cpu(vq, indirect->len);
-   struct iov_iter from;
-   int ret, access;
-
-   /* Sanity check */
-   if (unlikely(len % sizeof desc)) {
-   vq_err(vq, "Invalid length in indirect descriptor: "
-  "len 0x%llx not multiple of 0x%zx\n",
-  (unsigned long long)len,
-  sizeof desc);
-   return -EINVAL;
-   }
-
-   ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, 
vq->indirect,
-UIO_MAXIOV, VHOST_ACCESS_RO);
-   if (unlikely(ret < 0)) {
-   if (ret != -EAGAIN)
-   vq_err(vq, "Translation failure %d in indirect.\n", 
ret);
-   return ret;
-   }
-   iov_iter_init(, READ, vq->indirect, ret, len);
-
-   /* We will use the result as an address to read from, so most
-* architectures only need a compiler barrier here. */
-   read_barrier_depends();
-
-   count = len / sizeof desc;
-   /* Buffers are chained via a 16 bit next field, so
-* we can have at most 2^16 of these. */
-   if (unlikely(count > USHRT_MAX + 1)) {
-   vq_err(vq, "Indirect buffer length too big: %d\n",
-  indirect->len);
-   return -E2BIG;
-   }
-
-   do {
-   unsigned iov_count = *in_num + *out_num;
-   if (unlikely(++found > count)) {
-   vq_err(vq, "Loop detected: last one at %u "
-  "indirect size %u\n",
-  i, count);
-   return -EINVAL;
-   }
-   if (unlikely(!copy_from_iter_full(, sizeof(desc), ))) 
{
-   vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
-  i, (size_t)vhost64_to_cpu(vq, 

[PATCH RFC v8 10/11] vhost/vsock: switch to the buf API

2020-06-11 Thread Michael S. Tsirkin
A straight-forward conversion.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vsock.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec31d5c..61c6d3dd2ae3 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -103,7 +103,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
unsigned out, in;
size_t nbytes;
size_t iov_len, payload_len;
-   int head;
+   struct vhost_buf buf;
+   int ret;
 
spin_lock_bh(>send_pkt_list_lock);
if (list_empty(>send_pkt_list)) {
@@ -117,16 +118,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
list_del_init(>list);
spin_unlock_bh(>send_pkt_list_lock);
 
-   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-, , NULL, NULL);
-   if (head < 0) {
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov),
+ , , NULL, NULL);
+   if (ret < 0) {
spin_lock_bh(>send_pkt_list_lock);
list_add(>list, >send_pkt_list);
spin_unlock_bh(>send_pkt_list_lock);
break;
}
 
-   if (head == vq->num) {
+   if (!ret) {
spin_lock_bh(>send_pkt_list_lock);
list_add(>list, >send_pkt_list);
spin_unlock_bh(>send_pkt_list_lock);
@@ -186,7 +188,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 */
virtio_transport_deliver_tap_pkt(pkt);
 
-   vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
+   buf.in_len = sizeof(pkt->hdr) + payload_len;
+   vhost_put_used_buf(vq, );
added = true;
 
pkt->off += payload_len;
@@ -440,7 +443,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 dev);
struct virtio_vsock_pkt *pkt;
-   int head, pkts = 0, total_len = 0;
+   int ret, pkts = 0, total_len = 0;
+   struct vhost_buf buf;
unsigned int out, in;
bool added = false;
 
@@ -461,12 +465,13 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
goto no_more_replies;
}
 
-   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-, , NULL, NULL);
-   if (head < 0)
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov),
+ , , NULL, NULL);
+   if (ret < 0)
break;
 
-   if (head == vq->num) {
+   if (!ret) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
@@ -494,7 +499,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
virtio_transport_free_pkt(pkt);
 
len += sizeof(pkt->hdr);
-   vhost_add_used(vq, head, len);
+   buf.in_len = len;
+   vhost_put_used_buf(vq, );
total_len += len;
added = true;
} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 08/11] vhost/test: convert to the buf API

2020-06-11 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 7d69778aaa26..12304eb8da15 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -44,9 +44,10 @@ static void handle_vq(struct vhost_test *n)
 {
struct vhost_virtqueue *vq = >vqs[VHOST_TEST_VQ];
unsigned out, in;
-   int head;
+   int ret;
size_t len, total_len = 0;
void *private;
+   struct vhost_buf buf;
 
mutex_lock(>mutex);
private = vhost_vq_get_backend(vq);
@@ -58,15 +59,15 @@ static void handle_vq(struct vhost_test *n)
vhost_disable_notify(>dev, vq);
 
for (;;) {
-   head = vhost_get_vq_desc(vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-, ,
-NULL, NULL);
+   ret = vhost_get_avail_buf(vq, , vq->iov,
+ ARRAY_SIZE(vq->iov),
+ , ,
+ NULL, NULL);
/* On error, stop handling until the next kick. */
-   if (unlikely(head < 0))
+   if (unlikely(ret < 0))
break;
/* Nothing new?  Wait for eventfd to tell us they refilled. */
-   if (head == vq->num) {
+   if (!ret) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
@@ -78,13 +79,14 @@ static void handle_vq(struct vhost_test *n)
   "out %d, int %d\n", out, in);
break;
}
-   len = iov_length(vq->iov, out);
+   len = buf.out_len;
/* Sanity check */
if (!len) {
vq_err(vq, "Unexpected 0 len for TX\n");
break;
}
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_put_used_buf(vq, );
+   vhost_signal(>dev, vq);
total_len += len;
if (unlikely(vhost_exceeds_weight(vq, 0, total_len)))
break;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 04/11] vhost: reorder functions

2020-06-11 Thread Michael S. Tsirkin
Reorder functions in the file to not rely on forward
declarations, in preparation to making them static
down the road.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dfcdb36d4227..c38605b01080 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2425,19 +2425,6 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, 
int n)
 }
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
-{
-   struct vring_used_elem heads = {
-   cpu_to_vhost32(vq, head),
-   cpu_to_vhost32(vq, len)
-   };
-
-   return vhost_add_used_n(vq, , 1);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used);
-
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
@@ -2507,6 +2494,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+   struct vring_used_elem heads = {
+   cpu_to_vhost32(vq, head),
+   cpu_to_vhost32(vq, len)
+   };
+
+   return vhost_add_used_n(vq, , 1);
+}
+EXPORT_SYMBOL_GPL(vhost_add_used);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
__u16 old, new;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 09/11] vhost/scsi: switch to buf APIs

2020-06-11 Thread Michael S. Tsirkin
Switch to buf APIs. Doing this exposes a spec violation in vhost scsi:
all used bufs are marked with length 0.
Fix that is left for another day.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/scsi.c | 73 ++--
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0cbaa0b3893d..a5cdd4c01a3a 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -71,8 +71,8 @@ struct vhost_scsi_inflight {
 };
 
 struct vhost_scsi_cmd {
-   /* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-   int tvc_vq_desc;
+   /* Descriptor from vhost_get_avail_buf() for virt_queue segment */
+   struct vhost_buf tvc_vq_desc;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-   int head;
+   struct vhost_buf buf;
unsigned int out, in;
size_t req_size, rsp_size;
size_t out_size, in_size;
@@ -443,6 +443,20 @@ static int vhost_scsi_check_stop_free(struct se_cmd 
*se_cmd)
return target_put_sess_cmd(se_cmd);
 }
 
+/* Signal to guest that request finished with no input buffer. */
+/* TODO calling this when writing into buffer and most likely a bug */
+static void vhost_scsi_signal_noinput(struct vhost_dev *vdev,
+ struct vhost_virtqueue *vq,
+ struct vhost_buf *bufp)
+{
+   struct vhost_buf buf = *bufp;
+
+   buf.in_len = 0;
+   vhost_put_used_buf(vq, );
+   vhost_signal(vdev, vq);
+}
+
+
 static void
 vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 {
@@ -450,7 +464,8 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
struct virtio_scsi_event *event = >event;
struct virtio_scsi_event __user *eventp;
unsigned out, in;
-   int head, ret;
+   struct vhost_buf buf;
+   int ret;
 
if (!vhost_vq_get_backend(vq)) {
vs->vs_events_missed = true;
@@ -459,14 +474,14 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
 
 again:
vhost_disable_notify(>dev, vq);
-   head = vhost_get_vq_desc(vq, vq->iov,
-   ARRAY_SIZE(vq->iov), , ,
-   NULL, NULL);
-   if (head < 0) {
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov), , ,
+ NULL, NULL);
+   if (ret < 0) {
vs->vs_events_missed = true;
return;
}
-   if (head == vq->num) {
+   if (!ret) {
if (vhost_enable_notify(>dev, vq))
goto again;
vs->vs_events_missed = true;
@@ -488,7 +503,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
eventp = vq->iov[out].iov_base;
ret = __copy_to_user(eventp, event, sizeof(*event));
if (!ret)
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_scsi_signal_noinput(>dev, vq, );
else
vq_err(vq, "Faulted on vhost_scsi_send_event\n");
 }
@@ -549,7 +564,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work 
*work)
ret = copy_to_iter(_rsp, sizeof(v_rsp), _iter);
if (likely(ret == sizeof(v_rsp))) {
struct vhost_scsi_virtqueue *q;
-   vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+   vhost_put_used_buf(cmd->tvc_vq, >tvc_vq_desc);
q = container_of(cmd->tvc_vq, struct 
vhost_scsi_virtqueue, vq);
vq = q - vs->vqs;
__set_bit(vq, signal);
@@ -793,7 +808,7 @@ static void vhost_scsi_submission_work(struct work_struct 
*work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  int head, unsigned out)
+  struct vhost_buf *buf, unsigned out)
 {
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -804,7 +819,7 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
resp = vq->iov[out].iov_base;
ret = __copy_to_user(resp, , sizeof(rsp));
if (!ret)
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_scsi_signal_noinput(>dev, vq, buf);
else
pr_err("Faulted on virtio_scsi_cmd_resp\n");
 }
@@ -813,21 +828,21 @@ static int
 vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
struct vhost_scsi_ctx *vc)
 {
-   int ret = -ENXIO;
+   int r, ret = -ENXIO;
 
-   vc->head = 

[PATCH RFC v8 11/11] vhost: drop head based APIs

2020-06-11 Thread Michael S. Tsirkin
Everyone's using buf APIs, no need for head based ones anymore.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 64 ++-
 drivers/vhost/vhost.h | 12 
 2 files changed, 8 insertions(+), 68 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 03e6bca02288..9096bd291c91 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2425,39 +2425,11 @@ EXPORT_SYMBOL_GPL(vhost_get_avail_buf);
 void vhost_discard_avail_bufs(struct vhost_virtqueue *vq,
  struct vhost_buf *buf, unsigned count)
 {
-   vhost_discard_vq_desc(vq, count);
+   unfetch_descs(vq);
+   vq->last_avail_idx -= count;
 }
 EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
 
-/* This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
- struct iovec iov[], unsigned int iov_size,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num)
-{
-   struct vhost_buf buf;
-   int ret = vhost_get_avail_buf(vq, ,
- iov, iov_size, out_num, in_num,
- log, log_num);
-
-   if (likely(ret > 0))
-   return buf->id;
-   if (likely(!ret))
-   return vq->num;
-   return ret;
-}
-EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
-
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
-{
-   unfetch_descs(vq);
-   vq->last_avail_idx -= n;
-}
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
-
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
@@ -2490,8 +2462,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
return 0;
 }
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
+static
 int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 unsigned count)
 {
@@ -2525,10 +2496,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
}
return r;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
+static
 int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 {
struct vring_used_elem heads = {
@@ -2538,14 +2507,17 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 
return vhost_add_used_n(vq, , 1);
 }
-EXPORT_SYMBOL_GPL(vhost_add_used);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using vhost_signal. */
 int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
 {
return vhost_add_used(vq, buf->id, buf->in_len);
 }
 EXPORT_SYMBOL_GPL(vhost_put_used_buf);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using vhost_signal. */
 int vhost_put_used_n_bufs(struct vhost_virtqueue *vq,
  struct vhost_buf *bufs, unsigned count)
 {
@@ -2606,26 +2578,6 @@ void vhost_signal(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
-/* And here's the combo meal deal.  Supersize me! */
-void vhost_add_used_and_signal(struct vhost_dev *dev,
-  struct vhost_virtqueue *vq,
-  unsigned int head, int len)
-{
-   vhost_add_used(vq, head, len);
-   vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
-
-/* multi-buffer version of vhost_add_used_and_signal */
-void vhost_add_used_and_signal_n(struct vhost_dev *dev,
-struct vhost_virtqueue *vq,
-struct vring_used_elem *heads, unsigned count)
-{
-   vhost_add_used_n(vq, heads, count);
-   vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
-
 /* return true if we're sure that avaiable ring is empty */
 bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 28eea0155efb..264a2a2fae97 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -197,11 +197,6 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
 bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
-int vhost_get_vq_desc(struct vhost_virtqueue *,
- struct iovec 

[PATCH RFC v8 06/11] vhost/net: convert to new API: heads->bufs

2020-06-11 Thread Michael S. Tsirkin
Convert vhost net to use the new format-agnostic API.
In particular, don't poke at vq internals such as the
heads array.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 154 +++-
 1 file changed, 82 insertions(+), 72 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ff594eec8ae3..830fe84912a5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -59,13 +59,13 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy 
TX;"
  * status internally; used for zerocopy tx only.
  */
 /* Lower device DMA failed */
-#define VHOST_DMA_FAILED_LEN   ((__force __virtio32)3)
+#define VHOST_DMA_FAILED_LEN   (3)
 /* Lower device DMA done */
-#define VHOST_DMA_DONE_LEN ((__force __virtio32)2)
+#define VHOST_DMA_DONE_LEN (2)
 /* Lower device DMA in progress */
-#define VHOST_DMA_IN_PROGRESS  ((__force __virtio32)1)
+#define VHOST_DMA_IN_PROGRESS  (1)
 /* Buffer unused */
-#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0)
+#define VHOST_DMA_CLEAR_LEN(0)
 
 #define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force 
u32)VHOST_DMA_DONE_LEN)
 
@@ -112,9 +112,12 @@ struct vhost_net_virtqueue {
/* last used idx for outstanding DMA zerocopy buffers */
int upend_idx;
/* For TX, first used idx for DMA done zerocopy buffers
-* For RX, number of batched heads
+* For RX, number of batched bufs
 */
int done_idx;
+   /* Outstanding user bufs. UIO_MAXIOV in length. */
+   /* TODO: we can make this smaller for sure. */
+   struct vhost_buf *bufs;
/* Number of XDP frames batched */
int batched_xdp;
/* an array of userspace buffers info */
@@ -271,6 +274,8 @@ static void vhost_net_clear_ubuf_info(struct vhost_net *n)
int i;
 
for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+   kfree(n->vqs[i].bufs);
+   n->vqs[i].bufs = NULL;
kfree(n->vqs[i].ubuf_info);
n->vqs[i].ubuf_info = NULL;
}
@@ -282,6 +287,12 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n)
int i;
 
for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+   n->vqs[i].bufs = kmalloc_array(UIO_MAXIOV,
+  sizeof(*n->vqs[i].bufs),
+  GFP_KERNEL);
+   if (!n->vqs[i].bufs)
+   goto err;
+
zcopy = vhost_net_zcopy_mask & (0x1 << i);
if (!zcopy)
continue;
@@ -364,18 +375,18 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+   if (nvq->bufs[i].in_len == VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+   if (VHOST_DMA_IS_DONE(nvq->bufs[i].in_len)) {
+   nvq->bufs[i].in_len = VHOST_DMA_CLEAR_LEN;
++j;
} else
break;
}
while (j) {
add = min(UIO_MAXIOV - nvq->done_idx, j);
-   vhost_add_used_and_signal_n(vq->dev, vq,
-   >heads[nvq->done_idx], add);
+   vhost_put_used_n_bufs(vq, >bufs[nvq->done_idx], add);
+   vhost_signal(vq->dev, vq);
nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
j -= add;
}
@@ -390,7 +401,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   nvq->vq.heads[ubuf->desc].in_len = success ?
+   nvq->bufs[ubuf->desc].in_len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -452,7 +463,8 @@ static void vhost_net_signal_used(struct 
vhost_net_virtqueue *nvq)
if (!nvq->done_idx)
return;
 
-   vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
+   vhost_put_used_n_bufs(vq, nvq->bufs, nvq->done_idx);
+   vhost_signal(dev, vq);
nvq->done_idx = 0;
 }
 
@@ -558,6 +570,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_net_virtqueue *tnvq,
+   struct vhost_buf *buf,
unsigned int *out_num, unsigned int *in_num,
struct msghdr *msghdr, bool *busyloop_intr)
 {
@@ -565,10 +578,10 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *rvq = >vq;

[PATCH RFC v8 03/11] vhost/net: pass net specific struct pointer

2020-06-11 Thread Michael S. Tsirkin
In preparation for further cleanup, pass net specific pointer
to ubuf callbacks so we can move net specific fields
out to net structures.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index bf5e1d81ae25..ff594eec8ae3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
 */
atomic_t refcount;
wait_queue_head_t wait;
-   struct vhost_virtqueue *vq;
+   struct vhost_net_virtqueue *nvq;
 };
 
 #define VHOST_NET_BATCH 64
@@ -231,7 +231,7 @@ static void vhost_net_enable_zcopy(int vq)
 }
 
 static struct vhost_net_ubuf_ref *
-vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
+vhost_net_ubuf_alloc(struct vhost_net_virtqueue *nvq, bool zcopy)
 {
struct vhost_net_ubuf_ref *ubufs;
/* No zero copy backend? Nothing to count. */
@@ -242,7 +242,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
return ERR_PTR(-ENOMEM);
atomic_set(>refcount, 1);
init_waitqueue_head(>wait);
-   ubufs->vq = vq;
+   ubufs->nvq = nvq;
return ubufs;
 }
 
@@ -384,13 +384,13 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 {
struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
-   struct vhost_virtqueue *vq = ubufs->vq;
+   struct vhost_net_virtqueue *nvq = ubufs->nvq;
int cnt;
 
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = success ?
+   nvq->vq.heads[ubuf->desc].in_len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -402,7 +402,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
 * less than 10% of times).
 */
if (cnt <= 1 || !(cnt % 16))
-   vhost_poll_queue(>poll);
+   vhost_poll_queue(>vq.poll);
 
rcu_read_unlock_bh();
 }
@@ -1525,7 +1525,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
/* start polling new socket */
oldsock = vhost_vq_get_backend(vq);
if (sock != oldsock) {
-   ubufs = vhost_net_ubuf_alloc(vq,
+   ubufs = vhost_net_ubuf_alloc(nvq,
 sock && vhost_sock_zcopy(sock));
if (IS_ERR(ubufs)) {
r = PTR_ERR(ubufs);
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 00/11] vhost: ring format independence

2020-06-11 Thread Michael S. Tsirkin


This still causes corruption issues for people so don't try
to use in production please. Posting to expedite debugging.

This adds infrastructure required for supporting
multiple ring formats.

The idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

Used ring is similar: we fetch into an independent struct first,
convert that to IOV later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

For used descriptors, this allows keeping track of the buffer length
without need to rescan IOV.

This seems to perform exactly the same as the original
code based on a microbenchmark.
Lightly tested.
More testing would be very much appreciated.

changes from v8:
- squashed in fixes. no longer hangs but still known
  to cause data corruption for some people. under debug.

changes from v6:
- fixes some bugs introduced in v6 and v5

changes from v5:
- addressed comments by Jason: squashed API changes, fixed up discard

changes from v4:
- added used descriptor format independence
- addressed comments by jason
- fixed a crash detected by the lkp robot.

changes from v3:
- fixed error handling in case of indirect descriptors
- add BUG_ON to detect buffer overflow in case of bugs
in response to comment by Jason Wang
- minor code tweaks

Changes from v2:
- fixed indirect descriptor batching
reported by Jason Wang

Changes from v1:
- typo fixes


Michael S. Tsirkin (14):
  vhost: option to fetch descriptors through an independent struct
  fixup! vhost: option to fetch descriptors through an independent
struct


Michael S. Tsirkin (11):
  vhost: option to fetch descriptors through an independent struct
  vhost: use batched get_vq_desc version
  vhost/net: pass net specific struct pointer
  vhost: reorder functions
  vhost: format-independent API for used buffers
  vhost/net: convert to new API: heads->bufs
  vhost/net: avoid iov length math
  vhost/test: convert to the buf API
  vhost/scsi: switch to buf APIs
  vhost/vsock: switch to the buf API
  vhost: drop head based APIs

 drivers/vhost/net.c   | 174 +--
 drivers/vhost/scsi.c  |  73 
 drivers/vhost/test.c  |  22 +--
 drivers/vhost/vhost.c | 378 +++---
 drivers/vhost/vhost.h |  44 +++--
 drivers/vhost/vsock.c |  30 ++--
 6 files changed, 439 insertions(+), 282 deletions(-)

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v8 01/11] vhost: option to fetch descriptors through an independent struct

2020-06-11 Thread Michael S. Tsirkin
The idea is to support multiple ring formats by converting
to a format-independent array of descriptors.

This costs extra cycles, but we gain in ability
to fetch a batch of descriptors in one go, which
is good for code cache locality.

When used, this causes a minor performance degradation,
it's been kept as simple as possible for ease of review.
A follow-up patch gets us back the performance by adding batching.

To simplify benchmarking, I kept the old code around so one can switch
back and forth between old and new code. This will go away in the final
submission.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Link: https://lore.kernel.org/r/20200401183118.8334-2-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 305 +-
 drivers/vhost/vhost.h |  16 +++
 2 files changed, 320 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 172da092107e..11433d709651 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -303,6 +303,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
 {
vq->num = 1;
+   vq->ndescs = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -373,6 +374,9 @@ static int vhost_worker(void *data)
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
+   kfree(vq->descs);
+   vq->descs = NULL;
+   vq->max_descs = 0;
kfree(vq->indirect);
vq->indirect = NULL;
kfree(vq->log);
@@ -389,6 +393,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->max_descs = dev->iov_limit;
+   vq->descs = kmalloc_array(vq->max_descs,
+ sizeof(*vq->descs),
+ GFP_KERNEL);
vq->indirect = kmalloc_array(UIO_MAXIOV,
 sizeof(*vq->indirect),
 GFP_KERNEL);
@@ -396,7 +404,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
GFP_KERNEL);
vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
  GFP_KERNEL);
-   if (!vq->indirect || !vq->log || !vq->heads)
+   if (!vq->indirect || !vq->log || !vq->heads || !vq->descs)
goto err_nomem;
}
return 0;
@@ -488,6 +496,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->descs = NULL;
+   vq->max_descs = 0;
vq->log = NULL;
vq->indirect = NULL;
vq->heads = NULL;
@@ -2315,6 +2325,299 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   return >descs[vq->ndescs - 1];
+}
+
+static void pop_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   --vq->ndescs;
+}
+
+#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
+ VRING_DESC_F_NEXT)
+static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc 
*desc, u16 id)
+{
+   struct vhost_desc *h;
+
+   if (unlikely(vq->ndescs >= vq->max_descs))
+   return -EINVAL;
+   h = >descs[vq->ndescs++];
+   h->addr = vhost64_to_cpu(vq, desc->addr);
+   h->len = vhost32_to_cpu(vq, desc->len);
+   h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
+   h->id = id;
+
+   return 0;
+}
+
+static int fetch_indirect_descs(struct vhost_virtqueue *vq,
+   struct vhost_desc *indirect,
+   u16 head)
+{
+   struct vring_desc desc;
+   unsigned int i = 0, count, found = 0;
+   u32 len = indirect->len;
+   struct iov_iter from;
+   int ret;
+
+   /* Sanity check */
+   if (unlikely(len % sizeof desc)) {
+   vq_err(vq, "Invalid length in indirect descriptor: "
+  "len 0x%llx not multiple of 0x%zx\n",
+  (unsigned long long)len,
+  sizeof desc);
+   return -EINVAL;
+   }
+
+   ret = translate_desc(vq, indirect->addr, len, vq->indirect,
+UIO_MAXIOV, VHOST_ACCESS_RO);
+   if (unlikely(ret < 0)) {
+   if (ret != -EAGAIN)
+   vq_err(vq, "Translation failure %d in indirect.\n", 
ret);
+   return ret;
+   }
+   iov_iter_init(, READ, vq->indirect, ret, len);
+
+   /* We will use the result as an address to read from, so most
+   

[PATCH RFC v8 05/11] vhost: format-independent API for used buffers

2020-06-11 Thread Michael S. Tsirkin
Add a new API that doesn't assume used ring, heads, etc.
For now, we keep the old APIs around to make it easier
to convert drivers.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 73 +--
 drivers/vhost/vhost.h | 17 +-
 2 files changed, 79 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c38605b01080..03e6bca02288 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2335,13 +2335,12 @@ static void unfetch_descs(struct vhost_virtqueue *vq)
  * number of output then some number of input descriptors, it's actually two
  * iovecs, but we pack them into one and note how many of each there were.
  *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
- struct iovec iov[], unsigned int iov_size,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num)
+ * This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
+ * A negative code is returned on error. */
+int vhost_get_avail_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf,
+   struct iovec iov[], unsigned int iov_size,
+   unsigned int *out_num, unsigned int *in_num,
+   struct vhost_log *log, unsigned int *log_num)
 {
int ret = fetch_descs(vq);
int i;
@@ -2354,6 +2353,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
*out_num = *in_num = 0;
if (unlikely(log))
*log_num = 0;
+   buf->in_len = buf->out_len = 0;
+   buf->descs = 0;
 
for (i = vq->first_desc; i < vq->ndescs; ++i) {
unsigned iov_count = *in_num + *out_num;
@@ -2383,6 +2384,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
/* If this is an input descriptor,
 * increment that count. */
*in_num += ret;
+   buf->in_len += desc->len;
if (unlikely(log && ret)) {
log[*log_num].addr = desc->addr;
log[*log_num].len = desc->len;
@@ -2398,9 +2400,11 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
goto err;
}
*out_num += ret;
+   buf->out_len += desc->len;
}
 
-   ret = desc->id;
+   buf->id = desc->id;
+   ++buf->descs;
 
if (!(desc->flags & VRING_DESC_F_NEXT))
break;
@@ -2408,12 +2412,41 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
vq->first_desc = i + 1;
 
-   return ret;
+   return 1;
 
 err:
unfetch_descs(vq);
 
-   return ret ? ret : vq->num;
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vhost_get_avail_buf);
+
+/* Reverse the effect of vhost_get_avail_buf. Useful for error handling. */
+void vhost_discard_avail_bufs(struct vhost_virtqueue *vq,
+ struct vhost_buf *buf, unsigned count)
+{
+   vhost_discard_vq_desc(vq, count);
+}
+EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
+
+/* This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error. */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+ struct iovec iov[], unsigned int iov_size,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num)
+{
+   struct vhost_buf buf;
+   int ret = vhost_get_avail_buf(vq, ,
+ iov, iov_size, out_num, in_num,
+ log, log_num);
+
+   if (likely(ret > 0))
+   return buf->id;
+   if (likely(!ret))
+   return vq->num;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
@@ -2507,6 +2540,26 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
+int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
+{
+   return vhost_add_used(vq, buf->id, buf->in_len);
+}
+EXPORT_SYMBOL_GPL(vhost_put_used_buf);
+
+int vhost_put_used_n_bufs(struct vhost_virtqueue *vq,
+ struct vhost_buf *bufs, unsigned count)
+{
+   unsigned i;
+
+   for (i = 0; i < count; ++i) {
+   vq->heads[i].id = cpu_to_vhost32(vq, bufs[i].id);
+   vq->heads[i].len = cpu_to_vhost32(vq, bufs[i].in_len);
+   }
+
+   return vhost_add_used_n(vq, vq->heads, count);
+}

Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread David Hildenbrand


> Am 11.06.2020 um 13:18 schrieb Michael S. Tsirkin :
> 
> On Thu, Jun 11, 2020 at 01:00:24PM +0200, David Hildenbrand wrote:
 I'd like to have this patch in 5.8, with the initial merge of virtio-mem
 if possible (so the user space representation of virtio-mem added memory
 resources won't change anymore).
>>> 
>>> So my plan is to rebase on top of -rc1 and merge this for rc2 then.
>>> I don't like rebase on top of tip as the results are sometimes kind of
>>> random.
>> 
>> Right, I just wanted to get this out early so we can discuss how to proceed.
>> 
>>> And let's add a Fixes: tag as well, this way people will remember to
>>> pick this.
>>> Makes sense?
>> 
>> Yes, it's somehow a fix (for kexec). So
>> 
>> Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug")
>> 
>> I can respin after -rc1 with the commit id fixed as noted by Pankaj.
>> Just let me know what you prefer.
>> 
>> Thanks!
> 
> Some once this commit is in Linus' tree, please ping me.

It already is as mentioned, only the id was wrong.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7b7b27214bba1966772f9213cd2d8e5d67f8487f

> 
>> -- 
>> Thanks,
>> 
>> David / dhildenb
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v7 03/14] vhost: use batched get_vq_desc version

2020-06-11 Thread Michael S. Tsirkin
On Wed, Jun 10, 2020 at 06:18:32PM +0200, Eugenio Perez Martin wrote:
> On Wed, Jun 10, 2020 at 5:13 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 10, 2020 at 02:37:50PM +0200, Eugenio Perez Martin wrote:
> > > > +/* This function returns a value > 0 if a descriptor was found, or 0 
> > > > if none were found.
> > > > + * A negative code is returned on error. */
> > > > +static int fetch_descs(struct vhost_virtqueue *vq)
> > > > +{
> > > > +   int ret;
> > > > +
> > > > +   if (unlikely(vq->first_desc >= vq->ndescs)) {
> > > > +   vq->first_desc = 0;
> > > > +   vq->ndescs = 0;
> > > > +   }
> > > > +
> > > > +   if (vq->ndescs)
> > > > +   return 1;
> > > > +
> > > > +   for (ret = 1;
> > > > +ret > 0 && vq->ndescs <= vhost_vq_num_batch_descs(vq);
> > > > +ret = fetch_buf(vq))
> > > > +   ;
> > >
> > > (Expanding comment in V6):
> > >
> > > We get an infinite loop this way:
> > > * vq->ndescs == 0, so we call fetch_buf() here
> > > * fetch_buf gets less than vhost_vq_num_batch_descs(vq); descriptors. ret 
> > > = 1
> > > * This loop calls again fetch_buf, but vq->ndescs > 0 (and avail_vq ==
> > > last_avail_vq), so it just return 1
> >
> > That's what
> >  [PATCH RFC v7 08/14] fixup! vhost: use batched get_vq_desc version
> > is supposed to fix.
> >
> 
> Sorry, I forgot to include that fixup.
> 
> With it I don't see CPU stalls, but with that version latency has
> increased a lot and I see packet lost:
> + ping -c 5 10.200.0.1
> PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
> >From 10.200.0.2 icmp_seq=1 Destination Host Unreachable
> >From 10.200.0.2 icmp_seq=2 Destination Host Unreachable
> >From 10.200.0.2 icmp_seq=3 Destination Host Unreachable
> 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=6848 ms
> 
> --- 10.200.0.1 ping statistics ---
> 5 packets transmitted, 1 received, +3 errors, 80% packet loss, time 76ms
> rtt min/avg/max/mdev = 6848.316/6848.316/6848.316/0.000 ms, pipe 4
> --
> 
> I cannot even use netperf.

OK so that's the bug to try to find and fix I think.


> If I modify with my proposed version:
> + ping -c 5 10.200.0.1
> PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
> 64 bytes from 10.200.0.1: icmp_seq=1 ttl=64 time=7.07 ms
> 64 bytes from 10.200.0.1: icmp_seq=2 ttl=64 time=0.358 ms
> 64 bytes from 10.200.0.1: icmp_seq=3 ttl=64 time=5.35 ms
> 64 bytes from 10.200.0.1: icmp_seq=4 ttl=64 time=2.27 ms
> 64 bytes from 10.200.0.1: icmp_seq=5 ttl=64 time=0.426 ms


Not sure which version this is.

> [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t TCP_STREAM
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 10.200.0.1 () port 0 AF_INET
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
> 
> 131072  16384  1638410.014742.36
> [root@localhost ~]# netperf -H 10.200.0.1 -p 12865 -l 10 -t UDP_STREAM
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 10.200.0.1 () port 0 AF_INET
> Socket  Message  Elapsed  Messages
> SizeSize Time Okay Errors   Throughput
> bytes   bytessecs#  #   10^6bits/sec
> 
> 212992   65507   10.009214  0 482.83
> 212992   10.009214482.83
> 
> I will compare with the non-batch version for reference, but the
> difference between the two is noticeable. Maybe it's worth finding a
> good value for the if() inside fetch_buf?
> 
> Thanks!
> 

I don't think it's performance, I think it's a bug somewhere,
e.g. maybe we corrupt a packet, or stall the queue, or
something like this.

Let's do this, I will squash the fixups and post v8 so you can bisect
and then debug cleanly.

> > --
> > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread Michael S. Tsirkin
On Thu, Jun 11, 2020 at 01:00:24PM +0200, David Hildenbrand wrote:
> >> I'd like to have this patch in 5.8, with the initial merge of virtio-mem
> >> if possible (so the user space representation of virtio-mem added memory
> >> resources won't change anymore).
> > 
> > So my plan is to rebase on top of -rc1 and merge this for rc2 then.
> > I don't like rebase on top of tip as the results are sometimes kind of
> > random.
> 
> Right, I just wanted to get this out early so we can discuss how to proceed.
> 
> > And let's add a Fixes: tag as well, this way people will remember to
> > pick this.
> > Makes sense?
> 
> Yes, it's somehow a fix (for kexec). So
> 
> Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug")
> 
> I can respin after -rc1 with the commit id fixed as noted by Pankaj.
> Just let me know what you prefer.
> 
> Thanks!

Some once this commit is in Linus' tree, please ping me.

> -- 
> Thanks,
> 
> David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread David Hildenbrand
>> I'd like to have this patch in 5.8, with the initial merge of virtio-mem
>> if possible (so the user space representation of virtio-mem added memory
>> resources won't change anymore).
> 
> So my plan is to rebase on top of -rc1 and merge this for rc2 then.
> I don't like rebase on top of tip as the results are sometimes kind of
> random.

Right, I just wanted to get this out early so we can discuss how to proceed.

> And let's add a Fixes: tag as well, this way people will remember to
> pick this.
> Makes sense?

Yes, it's somehow a fix (for kexec). So

Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug")

I can respin after -rc1 with the commit id fixed as noted by Pankaj.
Just let me know what you prefer.

Thanks!

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread David Hildenbrand
On 11.06.20 12:32, Pankaj Gupta wrote:
>> Virtio-mem managed memory is always detected and added by the virtio-mem
>> driver, never using something like the firmware-provided memory map.
>> This is the case after an ordinary system reboot, and has to be guaranteed
>> after kexec. Especially, virtio-mem added memory resources can contain
>> inaccessible parts ("unblocked memory blocks"), blindly forwarding them
>> to a kexec kernel is dangerous, as unplugged memory will get accessed
>> (esp. written).
>>
>> Let's use the new way of adding special driver-managed memory introduced
>> in commit 75ac4c58bc0d ("mm/memory_hotplug: introduce
>> add_memory_driver_managed()").
> 
> Is this commit id correct?

Good point, it's the one from next-20200605.

7b7b27214bba

Is the correct one.

[...]

> 
> Looks good to me.
> Reviewed-by: Pankaj Gupta 
> 

Thanks!

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 42/75] x86/sev-es: Setup GHCB based boot #VC handler

2020-06-11 Thread Joerg Roedel
On Thu, Jun 04, 2020 at 05:30:27PM +0200, Borislav Petkov wrote:
> Hmmkay, I see vc_no_ghcb doing
> 
> calldo_vc_no_ghcb
> 
> and that's setup in early_idt_setup().
> 
> vc_boot_ghcb(), OTOH, is called by do_early_exception() only so that one
> could be called handle_vc_boot_ghcb(), no? I.e., I don't see it being an
> IDT entry point.

Right, I renamed it to handle_vc_boot_ghcb().


Joerg

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread Michael S. Tsirkin
virtio-mem: add memory via add_memory_driver_managed()


On Thu, Jun 11, 2020 at 11:35:18AM +0200, David Hildenbrand wrote:
> Virtio-mem managed memory is always detected and added by the virtio-mem
> driver, never using something like the firmware-provided memory map.
> This is the case after an ordinary system reboot, and has to be guaranteed
> after kexec. Especially, virtio-mem added memory resources can contain
> inaccessible parts ("unblocked memory blocks"), blindly forwarding them
> to a kexec kernel is dangerous, as unplugged memory will get accessed
> (esp. written).
> 
> Let's use the new way of adding special driver-managed memory introduced
> in commit 75ac4c58bc0d ("mm/memory_hotplug: introduce
> add_memory_driver_managed()").
> 
> This will result in no entries in /sys/firmware/memmap ("raw firmware-
> provided memory map"), the memory resource will be flagged
> IORESOURCE_MEM_DRIVER_MANAGED (esp., kexec_file_load() will not place
> kexec images on this memory), and it is exposed as "System RAM
> (virtio_mem)" in /proc/iomem, so esp. kexec-tools can properly handle it.
> 
> Example /proc/iomem before this change:
>   [...]
>   14000-333ff : virtio0
> 14000-147ff : System RAM
>   33400-533ff : virtio1
> 33800-33fff : System RAM
> 34000-347ff : System RAM
> 34800-34fff : System RAM
>   [...]
> 
> Example /proc/iomem after this change:
>   [...]
>   14000-333ff : virtio0
> 14000-147ff : System RAM (virtio_mem)
>   33400-533ff : virtio1
> 33800-33fff : System RAM (virtio_mem)
> 34000-347ff : System RAM (virtio_mem)
> 34800-34fff : System RAM (virtio_mem)
>   [...]
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Pankaj Gupta 
> Cc: teawater 
> Signed-off-by: David Hildenbrand 
> ---
> 
> Based on latest Linus' tree (and not a tag) because
> - virtio-mem has just been merged via the vhost tree
> - add_memory_driver_managed() has been merged a week ago via the -mm tree
> 
> I'd like to have this patch in 5.8, with the initial merge of virtio-mem
> if possible (so the user space representation of virtio-mem added memory
> resources won't change anymore).

So my plan is to rebase on top of -rc1 and merge this for rc2 then.
I don't like rebase on top of tip as the results are sometimes kind of
random.
And let's add a Fixes: tag as well, this way people will remember to
pick this.
Makes sense?


> ---
>  drivers/virtio/virtio_mem.c | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 50c689f250450..d2eab3558a9e1 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -101,6 +101,11 @@ struct virtio_mem {
>  
>   /* The parent resource for all memory added via this device. */
>   struct resource *parent_resource;
> + /*
> +  * Copy of "System RAM (virtio_mem)" to be used for
> +  * add_memory_driver_managed().
> +  */
> + const char *resource_name;
>  
>   /* Summary of all memory block states. */
>   unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
> @@ -414,8 +419,20 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, 
> unsigned long mb_id)
>   if (nid == NUMA_NO_NODE)
>   nid = memory_add_physaddr_to_nid(addr);
>  
> + /*
> +  * When force-unloading the driver and we still have memory added to
> +  * Linux, the resource name has to stay.
> +  */
> + if (!vm->resource_name) {
> + vm->resource_name = kstrdup_const("System RAM (virtio_mem)",
> +   GFP_KERNEL);
> + if (!vm->resource_name)
> + return -ENOMEM;
> + }
> +
>   dev_dbg(>vdev->dev, "adding memory block: %lu\n", mb_id);
> - return add_memory(nid, addr, memory_block_size_bytes());
> + return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
> +  vm->resource_name);
>  }
>  
>  /*
> @@ -1890,10 +1907,12 @@ static void virtio_mem_remove(struct virtio_device 
> *vdev)
>   vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
>   vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
>   vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
> - vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE])
> + vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
>   dev_warn(>dev, "device still has system memory added\n");
> - else
> + } else {
>   virtio_mem_delete_resource(vm);
> + kfree_const(vm->resource_name);
> + }
>  
>   /* remove all tracking data - no locking needed */
>   vfree(vm->mb_state);
> -- 
> 2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH v3 40/75] x86/sev-es: Compile early handler code into kernel image

2020-06-11 Thread Joerg Roedel
On Thu, Jun 04, 2020 at 05:19:53PM +0200, Borislav Petkov wrote:
> On Thu, Jun 04, 2020 at 01:54:13PM +0200, Joerg Roedel wrote:
> > It is not only the trace-point, this would also eliminate exception
> > handling in case the MSR access triggers a #GP. The "Unhandled MSR
> > read/write" messages would turn into a "General Protection Fault"
> > message.
> 
> But the early ones can trigger a #GP too. And there we can't handle
> those #GPs.
> 
> Why would the late ones need exception handling all of a sudden? And
> for the GHCB MSR, of all MSRs which the SEV-ES guest has used so far to
> bootstrap?!

For example when there is a bug in the code which triggers an SEV-ES-only
code-path at runtime on bare-metal or in a non-SEV-ES VM. When the MSR
is accessed accidentially in that code-path the exception handling will
be helpful.


Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 25/75] x86/sev-es: Add support for handling IOIO exceptions

2020-06-11 Thread Joerg Roedel
On Thu, Jun 04, 2020 at 07:59:33AM -0700, Sean Christopherson wrote:
> But SEV already broke it, no?

True, but with SEV-ES is can be fixed again, and all SEV hardware also
supports SEV-ES.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1] virtio-mem: add memory via add_memory_driver_managed()

2020-06-11 Thread David Hildenbrand
Virtio-mem managed memory is always detected and added by the virtio-mem
driver, never using something like the firmware-provided memory map.
This is the case after an ordinary system reboot, and has to be guaranteed
after kexec. Especially, virtio-mem added memory resources can contain
inaccessible parts ("unblocked memory blocks"), blindly forwarding them
to a kexec kernel is dangerous, as unplugged memory will get accessed
(esp. written).

Let's use the new way of adding special driver-managed memory introduced
in commit 75ac4c58bc0d ("mm/memory_hotplug: introduce
add_memory_driver_managed()").

This will result in no entries in /sys/firmware/memmap ("raw firmware-
provided memory map"), the memory resource will be flagged
IORESOURCE_MEM_DRIVER_MANAGED (esp., kexec_file_load() will not place
kexec images on this memory), and it is exposed as "System RAM
(virtio_mem)" in /proc/iomem, so esp. kexec-tools can properly handle it.

Example /proc/iomem before this change:
  [...]
  14000-333ff : virtio0
14000-147ff : System RAM
  33400-533ff : virtio1
33800-33fff : System RAM
34000-347ff : System RAM
34800-34fff : System RAM
  [...]

Example /proc/iomem after this change:
  [...]
  14000-333ff : virtio0
14000-147ff : System RAM (virtio_mem)
  33400-533ff : virtio1
33800-33fff : System RAM (virtio_mem)
34000-347ff : System RAM (virtio_mem)
34800-34fff : System RAM (virtio_mem)
  [...]

Cc: "Michael S. Tsirkin" 
Cc: Pankaj Gupta 
Cc: teawater 
Signed-off-by: David Hildenbrand 
---

Based on latest Linus' tree (and not a tag) because
- virtio-mem has just been merged via the vhost tree
- add_memory_driver_managed() has been merged a week ago via the -mm tree

I'd like to have this patch in 5.8, with the initial merge of virtio-mem
if possible (so the user space representation of virtio-mem added memory
resources won't change anymore).

---
 drivers/virtio/virtio_mem.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 50c689f250450..d2eab3558a9e1 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -101,6 +101,11 @@ struct virtio_mem {
 
/* The parent resource for all memory added via this device. */
struct resource *parent_resource;
+   /*
+* Copy of "System RAM (virtio_mem)" to be used for
+* add_memory_driver_managed().
+*/
+   const char *resource_name;
 
/* Summary of all memory block states. */
unsigned long nb_mb_state[VIRTIO_MEM_MB_STATE_COUNT];
@@ -414,8 +419,20 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, 
unsigned long mb_id)
if (nid == NUMA_NO_NODE)
nid = memory_add_physaddr_to_nid(addr);
 
+   /*
+* When force-unloading the driver and we still have memory added to
+* Linux, the resource name has to stay.
+*/
+   if (!vm->resource_name) {
+   vm->resource_name = kstrdup_const("System RAM (virtio_mem)",
+ GFP_KERNEL);
+   if (!vm->resource_name)
+   return -ENOMEM;
+   }
+
dev_dbg(>vdev->dev, "adding memory block: %lu\n", mb_id);
-   return add_memory(nid, addr, memory_block_size_bytes());
+   return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
+vm->resource_name);
 }
 
 /*
@@ -1890,10 +1907,12 @@ static void virtio_mem_remove(struct virtio_device 
*vdev)
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL] ||
-   vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE])
+   vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE]) {
dev_warn(>dev, "device still has system memory added\n");
-   else
+   } else {
virtio_mem_delete_resource(vm);
+   kfree_const(vm->resource_name);
+   }
 
/* remove all tracking data - no locking needed */
vfree(vm->mb_state);
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v6 02/11] vhost: use batched get_vq_desc version

2020-06-11 Thread Michael S. Tsirkin
On Thu, Jun 11, 2020 at 11:02:57AM +0800, Jason Wang wrote:
> 
> On 2020/6/10 下午7:05, Michael S. Tsirkin wrote:
> > > > +EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
> > > >/* Reverse the effect of vhost_get_vq_desc. Useful for error 
> > > > handling. */
> > > >void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
> > > >{
> > > > +   unfetch_descs(vq);
> > > > vq->last_avail_idx -= n;
> > > So unfetch_descs() has decreased last_avail_idx.
> > > Can we fix this by letting unfetch_descs() return the number and then we 
> > > can
> > > do:
> > > 
> > > int d = unfetch_descs(vq);
> > > vq->last_avail_idx -= (n > d) ? n - d: 0;
> > > 
> > > Thanks
> > That's intentional I think - we need both.
> 
> 
> Yes, but:
> 
> 
> > 
> > Unfetch_descs drops the descriptors in the cache that were
> > *not returned to caller*  through get_vq_desc.
> > 
> > vhost_discard_vq_desc drops the ones that were returned through get_vq_desc.
> > 
> > Did I miss anything?
> 
> We could count some descriptors twice, consider the case e.g we only cache
> on descriptor:
> 
> fetch_descs()
>     fetch_buf()
>         last_avail_idx++;
> 
> Then we want do discard it:
> vhost_discard_avail_buf(1)
>     unfetch_descs()
>         last_avail_idx--;
>     last_avail_idx -= 1;
> 
> Thanks


I don't think that happens. vhost_discard_avail_buf(1) is only called
after get vhost_get_avail_buf. vhost_get_avail_buf increments
first_desc.  unfetch_descs only counts from first_desc to ndescs.

If I'm wrong, could you show values of first_desc and ndescs in this
scenario?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization