Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space

2021-05-19 Thread Michael S. Tsirkin
On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote:
> On Wed, May 19, 2021 at 10:42 PM Dan Carpenter  
> wrote:
> >
> > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji  
> > > wrote:
> > > >
> > > > This ensures that we will not use an invalid block size
> > > > in config space (might come from an untrusted device).
> >
> > I looked at if I should add this as an untrusted function so that Smatch
> > could find these sorts of bugs but this is reading data from the host so
> > there has to be some level of trust...
> >
> 
> It would be great if Smatch could detect this case if possible. The
> data might be trusted in traditional VM cases. But now the data can be
> read from a userspace daemon when VDUSE is enabled.
> 
> > I should add some more untrusted data kvm functions to Smatch.  Right
> > now I only have kvm_register_read() and I've added kvm_read_guest_virt()
> > just now.
> >
> > > >
> > > > Signed-off-by: Xie Yongji 
> > > > ---
> > > >  drivers/block/virtio_blk.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index ebb4d3fe803f..c848aa36d49b 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> > > >struct virtio_blk_config, blk_size,
> > > >_size);
> > > > -   if (!err)
> > > > +   if (!err && blk_size > 0 && blk_size <= max_size)
> > >
> > > The check here is incorrect. I will use PAGE_SIZE as the maximum
> > > boundary in the new version.
> >
> > What does this bug look like to the user?
> 
> The kernel will panic if the block size is larger than PAGE_SIZE.

Kernel panic at this point is par for the course IMHO.
Let's focus on eliminating data corruption for starters.

> > A minimum block size of 1 seems pretty crazy.  Surely the minimum should be 
> > > higher?
> >
> 
> Yes, 512 is better here.
> 
> Thanks,
> Yongji

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 5/8] x86/sev-es: Leave NMI-mode before sending signals

2021-05-19 Thread Peter Zijlstra
On Wed, May 19, 2021 at 09:13:08PM +0200, Joerg Roedel wrote:
> Hi Peter,
> 
> thanks for your review.
> 
> On Wed, May 19, 2021 at 07:54:50PM +0200, Peter Zijlstra wrote:
> > On Wed, May 19, 2021 at 03:52:48PM +0200, Joerg Roedel wrote:
> > > --- a/arch/x86/kernel/sev.c
> > > +++ b/arch/x86/kernel/sev.c
> > > @@ -1343,9 +1343,10 @@ 
> > > DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
> > >   return;
> > >   }
> > >  
> > > + instrumentation_begin();
> > > +
> > >   irq_state = irqentry_nmi_enter(regs);
> > >   lockdep_assert_irqs_disabled();
> > > - instrumentation_begin();
> > >  
> > >   /*
> > >* This is invoked through an interrupt gate, so IRQs are disabled. The
> > 
> > That's just plain wrong. No instrumentation is allowed before you enter
> > the exception context.
> 
> Okay.
> 
> > > + irqentry_nmi_exit(regs, irq_state);
> > > +
> > 
> > And this is wrong too; because at this point the handler doesn't run in
> > _any_ context anymore, certainly not one you can call regular C code
> > from.
> 
> The #VC handler is at this point not running on the IST stack anymore,
> but on the stack it came from or on the task stack. So my believe was
> that at this point it inherits the context it came from (just like the
> page-fault handler). But I also don't fully understand the context
> tracking, so is my assumption wrong?

Being on the right stack is only part of the issue; you also need to
make sure your runtime environment is set up.

Regular kernel C expects a whole lot of things to be present; esp. so
with all the debug options on. The irqentry_*_enter() family of
functions very carefully sets up this environment and the
irqentry_*_exit() undoes it again. Before and after you really cannot
run normal code.

Just an example, RCU might not be watching, it might think the CPU is in
userspace and advance the GP while you're relying on it not doing so.

Similarly lockdep is in some undefined state and any lock used can
trigger random 'funny' things.

Just because this is 'C', doesn't immediately mean you can go call any
random function. Up until recently most of this was in ASM. There's a
reason for the noinstr annotations.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 5/8] x86/sev-es: Leave NMI-mode before sending signals

2021-05-19 Thread Joerg Roedel
Hi Peter,

thanks for your review.

On Wed, May 19, 2021 at 07:54:50PM +0200, Peter Zijlstra wrote:
> On Wed, May 19, 2021 at 03:52:48PM +0200, Joerg Roedel wrote:
> > --- a/arch/x86/kernel/sev.c
> > +++ b/arch/x86/kernel/sev.c
> > @@ -1343,9 +1343,10 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
> > return;
> > }
> >  
> > +   instrumentation_begin();
> > +
> > irq_state = irqentry_nmi_enter(regs);
> > lockdep_assert_irqs_disabled();
> > -   instrumentation_begin();
> >  
> > /*
> >  * This is invoked through an interrupt gate, so IRQs are disabled. The
> 
> That's just plain wrong. No instrumentation is allowed before you enter
> the exception context.

Okay.

> > +   irqentry_nmi_exit(regs, irq_state);
> > +
> 
> And this is wrong too; because at this point the handler doesn't run in
> _any_ context anymore, certainly not one you can call regular C code
> from.

The #VC handler is at this point not running on the IST stack anymore,
but on the stack it came from or on the task stack. So my believe was
that at this point it inherits the context it came from (just like the
page-fault handler). But I also don't fully understand the context
tracking, so is my assumption wrong?

Regards,

Joerg

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 5/8] x86/sev-es: Leave NMI-mode before sending signals

2021-05-19 Thread Peter Zijlstra
On Wed, May 19, 2021 at 03:52:48PM +0200, Joerg Roedel wrote:
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -1343,9 +1343,10 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
>   return;
>   }
>  
> + instrumentation_begin();
> +
>   irq_state = irqentry_nmi_enter(regs);
>   lockdep_assert_irqs_disabled();
> - instrumentation_begin();
>  
>   /*
>* This is invoked through an interrupt gate, so IRQs are disabled. The

That's just plain wrong. No instrumentation is allowed before you enter
the exception context.

> @@ -1395,13 +1396,19 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
>   BUG();
>   }
>  
> -out:
> - instrumentation_end();
>   irqentry_nmi_exit(regs, irq_state);
> + instrumentation_end();

And this can't be right either, same issue, no instrumentation is
allowed after you leave the exception context.

>  
>   return;
>  
>  fail:
> + /*
> +  * Leave NMI mode - the GHCB is not busy anymore and depending on where
> +  * the #VC came from this code is about to either kill the task (when in
> +  * task context) or kill the machine.
> +  */
> + irqentry_nmi_exit(regs, irq_state);
> +

And this is wrong too; because at this point the handler doesn't run in
_any_ context anymore, certainly not one you can call regular C code
from.

>   if (user_mode(regs)) {
>   /*
>* Do not kill the machine if user-space triggered the
> @@ -1423,7 +1430,9 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
>   panic("Returned from Terminate-Request to Hypervisor\n");
>   }
>  
> - goto out;
> + instrumentation_end();
> +
> + return;
>  }


You either get to do what MCE does, or what MCE does. That is, either
use task_work or MCE_USER and have the _user() handler use
irqentry_enter_from_user_mode().

The above is an absolute no-go.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space

2021-05-19 Thread Dan Carpenter
On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> On Mon, May 17, 2021 at 5:56 PM Xie Yongji  wrote:
> >
> > This ensures that we will not use an invalid block size
> > in config space (might come from an untrusted device).

I looked at if I should add this as an untrusted function so that Smatch
could find these sorts of bugs but this is reading data from the host so
there has to be some level of trust...

I should add some more untrusted data kvm functions to Smatch.  Right
now I only have kvm_register_read() and I've added kvm_read_guest_virt()
just now.

> >
> > Signed-off-by: Xie Yongji 
> > ---
> >  drivers/block/virtio_blk.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index ebb4d3fe803f..c848aa36d49b 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> >struct virtio_blk_config, blk_size,
> >_size);
> > -   if (!err)
> > +   if (!err && blk_size > 0 && blk_size <= max_size)
> 
> The check here is incorrect. I will use PAGE_SIZE as the maximum
> boundary in the new version.

What does this bug look like to the user?  A minimum block size of 1
seems pretty crazy.  Surely the minimum should be higher?

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Bloat] virtio_net: BQL?

2021-05-19 Thread Eric Dumazet



On 5/18/21 1:00 AM, Stephen Hemminger wrote:

> 
> The Azure network driver (netvsc) also does not have BQL. Several years ago
> I tried adding it but it benchmarked worse and there is the added complexity
> of handling the accelerated networking VF path.
> 


Note that NIC with many TX queues make BQL almost useless, only adding extra
overhead.

We should probably make BQL something that can be manually turned on/off.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH -next] vp_vdpa: fix error return code in vp_vdpa_probe()

2021-05-19 Thread Wei Yongjun
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 11d8ffed00b2 ("vp_vdpa: switch to use vp_modern_map_vq_notify()")
Reported-by: Hulk Robot 
Signed-off-by: Wei Yongjun 
---
 drivers/vdpa/virtio_pci/vp_vdpa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index c76ebb531212..e5d92db728d3 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -442,6 +442,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
vp_modern_map_vq_notify(mdev, i,
_vdpa->vring[i].notify_pa);
if (!vp_vdpa->vring[i].notify) {
+   ret = -ENOMEM;
dev_warn(>dev, "Fail to map vq notify %d\n", i);
goto err;
}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 7/8] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

The error reporting from the insn_fetch_from_user*() functions is not
very verbose. Extend it to include information on whether the linear
RIP could not be calculated or whether the memory access faulted.

This will be used in the SEV-ES code to propagate the correct
exception depending on what went wrong during instruction fetch.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/insn-eval.h |  6 +++--
 arch/x86/kernel/sev.c|  8 +++
 arch/x86/kernel/umip.c   | 10 -
 arch/x86/lib/insn-eval.c | 38 +++-
 4 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 91d7182ad2d6..8362f1ce4b00 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -22,9 +22,11 @@ int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs 
*regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
-unsigned char buf[MAX_INSN_SIZE]);
+unsigned char buf[MAX_INSN_SIZE],
+int *copied);
 int insn_fetch_from_user_inatomic(struct pt_regs *regs,
- unsigned char buf[MAX_INSN_SIZE]);
+ unsigned char buf[MAX_INSN_SIZE],
+ int *copied);
 bool insn_decode_from_regs(struct insn *insn, struct pt_regs *regs,
   unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 9a64030e74c0..1edb6cd5e308 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -258,17 +258,17 @@ static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
 static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt)
 {
char buffer[MAX_INSN_SIZE];
-   int res;
+   int insn_bytes = 0, res;
 
-   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
-   if (!res) {
+   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer, _bytes);
+   if (res) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
ctxt->fi.cr2= ctxt->regs->ip;
return ES_EXCEPTION;
}
 
-   if (!insn_decode_from_regs(>insn, ctxt->regs, buffer, res))
+   if (!insn_decode_from_regs(>insn, ctxt->regs, buffer, insn_bytes))
return ES_DECODE_FAILED;
 
if (ctxt->insn.immediate.got)
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 8daa70b0d2da..b005ef42682e 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -346,13 +346,13 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!regs)
return false;
 
-   nr_copied = insn_fetch_from_user(regs, buf);
-
/*
-* The insn_fetch_from_user above could have failed if user code
-* is protected by a memory protection key. Give up on emulation
-* in such a case.  Should we issue a page fault?
+* Give up on emulation if fetching the instruction failed. Should we
+* issue a page fault or a #GP?
 */
+   if (!insn_fetch_from_user(regs, buf, NULL))
+   return false;
+
if (!nr_copied)
return false;
 
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4eecb9c7c6a0..d8a057ba0895 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1442,27 +1442,36 @@ static int insn_get_effective_ip(struct pt_regs *regs, 
unsigned long *ip)
  * insn_fetch_from_user() - Copy instruction bytes from user-space memory
  * @regs:  Structure with register values as seen when entering kernel mode
  * @buf:   Array to store the fetched instruction
+ * @copied:Pointer to an int where the number of copied instruction bytes
+ * is stored. Can be NULL.
  *
  * Gets the linear address of the instruction and copies the instruction bytes
  * to the buf.
  *
  * Returns:
  *
- * Number of instruction bytes copied.
+ * -EINVAL if the linear address of the instruction could not be calculated
+ * -EFAULT if nothing was copied
+ *   0 on success
  *
- * 0 if nothing was copied.
  */
-int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
+int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE],
+int *copied)
 {
unsigned long ip;
int not_copied;
+   int bytes;
 
if (insn_get_effective_ip(regs, ))
-   return 0;
+   return -EINVAL;
 
not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
 
-   return MAX_INSN_SIZE - not_copied;
+   bytes = MAX_INSN_SIZE - not_copied;
+   if (copied)
+

[PATCH v2 5/8] x86/sev-es: Leave NMI-mode before sending signals

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

The error path in the runtime #VC handler sends a signal to kill the
current task if the exception was raised from user-space. Some parts of
the #VC handler run in NMI mode, because it is critical that it is not
interrupted (except from an NMI) while the GHCB is in use.

But sending signals in NMI-mode is actually broken and triggers lockdep
warnings. On the other side, when the signal is sent, there is no reason
for the handler to still be in NMI-mode, as the GHCB is not used
anymore.

Leave NMI-mode before entering the error path to get rid of the lockdep
warnings.

Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC 
handler")
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4fd997bbf059..9a64030e74c0 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1343,9 +1343,10 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
return;
}
 
+   instrumentation_begin();
+
irq_state = irqentry_nmi_enter(regs);
lockdep_assert_irqs_disabled();
-   instrumentation_begin();
 
/*
 * This is invoked through an interrupt gate, so IRQs are disabled. The
@@ -1395,13 +1396,19 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
BUG();
}
 
-out:
-   instrumentation_end();
irqentry_nmi_exit(regs, irq_state);
+   instrumentation_end();
 
return;
 
 fail:
+   /*
+* Leave NMI mode - the GHCB is not busy anymore and depending on where
+* the #VC came from this code is about to either kill the task (when in
+* task context) or kill the machine.
+*/
+   irqentry_nmi_exit(regs, irq_state);
+
if (user_mode(regs)) {
/*
 * Do not kill the machine if user-space triggered the
@@ -1423,7 +1430,9 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
panic("Returned from Terminate-Request to Hypervisor\n");
}
 
-   goto out;
+   instrumentation_end();
+
+   return;
 }
 
 /* This handler runs on the #VC fall-back stack. It can cause further #VC 
exceptions */
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/8] x86/sev-es: Don't return NULL from sev_es_get_ghcb()

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

The sev_es_get_ghcb() is called from several places, but only one of
them checks the return value. The reaction to returning NULL is always
the same: Calling panic() and kill the machine.

Instead of adding checks to all call-places, move the panic() into the
function itself so that it will no longer return NULL.

Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4fa111becc93..82bced88153b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -203,8 +203,18 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct 
ghcb_state *state)
if (unlikely(data->ghcb_active)) {
/* GHCB is already in use - save its contents */
 
-   if (unlikely(data->backup_ghcb_active))
-   return NULL;
+   if (unlikely(data->backup_ghcb_active)) {
+   /*
+* Backup-GHCB is also already in use. There is no way
+* to continue here so just kill the machine. To make
+* panic() work, mark GHCBs inactive so that messages
+* can be printed out.
+*/
+   data->ghcb_active= false;
+   data->backup_ghcb_active = false;
+
+   panic("Unable to handle #VC exception! GHCB and Backup 
GHCB are already in use");
+   }
 
/* Mark backup_ghcb active before writing to it */
data->backup_ghcb_active = true;
@@ -1289,7 +1299,6 @@ static __always_inline bool on_vc_fallback_stack(struct 
pt_regs *regs)
  */
 DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 {
-   struct sev_es_runtime_data *data = this_cpu_read(runtime_data);
irqentry_state_t irq_state;
struct ghcb_state state;
struct es_em_ctxt ctxt;
@@ -1315,16 +1324,6 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 */
 
ghcb = sev_es_get_ghcb();
-   if (!ghcb) {
-   /*
-* Mark GHCBs inactive so that panic() is able to print the
-* message.
-*/
-   data->ghcb_active= false;
-   data->backup_ghcb_active = false;
-
-   panic("Unable to handle #VC exception! GHCB and Backup GHCB are 
already in use");
-   }
 
vc_ghcb_invalidate(ghcb);
result = vc_init_em_ctxt(, regs, error_code);
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/8] x86/sev-es: Forward page-faults which happen during emulation

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

When emulating guest instructions for MMIO or IOIO accesses the #VC
handler might get a page-fault and will not be able to complete. Forward
the page-fault in this case to the correct handler instead of killing
the machine.

Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 82bced88153b..1f428f401bed 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1270,6 +1270,10 @@ static __always_inline void vc_forward_exception(struct 
es_em_ctxt *ctxt)
case X86_TRAP_UD:
exc_invalid_op(ctxt->regs);
break;
+   case X86_TRAP_PF:
+   write_cr2(ctxt->fi.cr2);
+   exc_page_fault(ctxt->regs, error_code);
+   break;
case X86_TRAP_AC:
exc_alignment_check(ctxt->regs, error_code);
break;
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 6/8] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

In theory 0 is a valid value for the instruction pointer, so don't use
it as the error return value from insn_get_effective_ip().

Signed-off-by: Joerg Roedel 
---
 arch/x86/lib/insn-eval.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a67afd74232c..4eecb9c7c6a0 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1417,7 +1417,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
}
 }
 
-static unsigned long insn_get_effective_ip(struct pt_regs *regs)
+static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip)
 {
unsigned long seg_base = 0;
 
@@ -1430,10 +1430,12 @@ static unsigned long insn_get_effective_ip(struct 
pt_regs *regs)
if (!user_64bit_mode(regs)) {
seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
if (seg_base == -1L)
-   return 0;
+   return -EINVAL;
}
 
-   return seg_base + regs->ip;
+   *ip = seg_base + regs->ip;
+
+   return 0;
 }
 
 /**
@@ -1455,8 +1457,7 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned 
char buf[MAX_INSN_SIZE])
unsigned long ip;
int not_copied;
 
-   ip = insn_get_effective_ip(regs);
-   if (!ip)
+   if (insn_get_effective_ip(regs, ))
return 0;
 
not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
@@ -1484,8 +1485,7 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs, 
unsigned char buf[MAX_IN
unsigned long ip;
int not_copied;
 
-   ip = insn_get_effective_ip(regs);
-   if (!ip)
+   if (insn_get_effective_ip(regs, ))
return 0;
 
not_copied = __copy_from_user_inatomic(buf, (void __user *)ip, 
MAX_INSN_SIZE);
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 8/8] x86/sev-es: Propagate #GP if getting linear instruction address failed

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

When an instruction is fetched from user-space, segmentation needs to
be taken into account. This means that getting the linear address of
an instruction can fail. Hardware would raise a #GP
exception in that case, but the #VC exception handler would emulate it
as a page-fault.

The insn_fetch_from_user*() functions now provide the relevant
information in case of an failure. Use that and propagate a #GP when
the linear address of an instruction to fetch could not be calculated.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1edb6cd5e308..4736290361e4 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -261,11 +261,16 @@ static enum es_result __vc_decode_user_insn(struct 
es_em_ctxt *ctxt)
int insn_bytes = 0, res;
 
res = insn_fetch_from_user_inatomic(ctxt->regs, buffer, _bytes);
-   if (res) {
+   if (res == -EFAULT) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
ctxt->fi.cr2= ctxt->regs->ip;
return ES_EXCEPTION;
+   } else if (res == -EINVAL) {
+   ctxt->fi.vector = X86_TRAP_GP;
+   ctxt->fi.error_code = 0;
+   ctxt->fi.cr2= 0;
+   return ES_EXCEPTION;
}
 
if (!insn_decode_from_regs(>insn, ctxt->regs, buffer, insn_bytes))
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 3/8] x86/sev-es: Use __put_user()/__get_user() for data accesses

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

The put_user() and get_user() functions do checks on the address which is
passed to them. They check whether the address is actually a user-space
address and whether its fine to access it. They also call might_fault()
to indicate that they could fault and possibly sleep.

All of these checks are neither wanted nor needed in the #VC exception
handler, which can be invoked from almost any context and also for MMIO
instructions from kernel space on kernel memory. All the #VC handler
wants to know is whether a fault happened when the access was tried.

This is provided by __put_user()/__get_user(), which just do the access
no matter what. Also add comments explaining why __get_user() and
__put_user() are the best choice here and why it is safe to use them
in this context. Also explain why copy_to/from_user can't be used.

In addition, also revert commit

024f60d6552 x86/sev-es: ("Handle string port IO to kernel memory 
properly")

because using __get_user()/__put_user() fixes the same problem while
the above commit introduced several problems:

1) It uses access_ok() which is only allowed in task context.

2) It uses memcpy() which has no fault handling at all and is
   thus unsafe to use here.

Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image")
Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 66 ++-
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1f428f401bed..651b81cd648e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -315,31 +315,44 @@ static enum es_result vc_write_mem(struct es_em_ctxt 
*ctxt,
u16 d2;
u8  d1;
 
-   /* If instruction ran in kernel mode and the I/O buffer is in kernel 
space */
-   if (!user_mode(ctxt->regs) && !access_ok(target, size)) {
-   memcpy(dst, buf, size);
-   return ES_OK;
-   }
-
+   /*
+* This function uses __put_user() independent of whether kernel or user
+* memory is accessed. This works fine because __put_user() does no
+* sanity checks of the pointer being accessed. All that it does is
+* to report when the access failed.
+*
+* Also, this function runs in atomic context, so __put_user() is not
+* allowed to sleep. The page-fault handler detects that it is running
+* in atomic context and will not try to take mmap_sem and handle the
+* fault, so additional pagefault_enable()/disable() calls are not
+* needed.
+*
+* The access can't be done via copy_to_user() here because
+* vc_write_mem() must not use string instructions to access unsafe
+* memory. The reason is that MOVS is emulated by the #VC handler by
+* splitting the move up into a read and a write and taking a nested #VC
+* exception on whatever of them is the MMIO access. Using string
+* instructions here would cause infinite nesting.
+*/
switch (size) {
case 1:
memcpy(, buf, 1);
-   if (put_user(d1, target))
+   if (__put_user(d1, target))
goto fault;
break;
case 2:
memcpy(, buf, 2);
-   if (put_user(d2, target))
+   if (__put_user(d2, target))
goto fault;
break;
case 4:
memcpy(, buf, 4);
-   if (put_user(d4, target))
+   if (__put_user(d4, target))
goto fault;
break;
case 8:
memcpy(, buf, 8);
-   if (put_user(d8, target))
+   if (__put_user(d8, target))
goto fault;
break;
default:
@@ -370,30 +383,43 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
u16 d2;
u8  d1;
 
-   /* If instruction ran in kernel mode and the I/O buffer is in kernel 
space */
-   if (!user_mode(ctxt->regs) && !access_ok(s, size)) {
-   memcpy(buf, src, size);
-   return ES_OK;
-   }
-
+   /*
+* This function uses __get_user() independent of whether kernel or user
+* memory is accessed. This works fine because __get_user() does no
+* sanity checks of the pointer being accessed. All that it does is
+* to report when the access failed.
+*
+* Also, this function runs in atomic context, so __get_user() is not
+* allowed to sleep. The page-fault handler detects that it is running
+* in atomic context and will not try to take mmap_sem and handle the
+* fault, so additional pagefault_enable()/disable() calls are not
+* needed.
+*
+* The access can't be done via copy_from_user() 

[PATCH v2 4/8] x86/sev-es: Fix error message in runtime #VC handler

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

The runtime #VC handler is not "early" anymore. Fix the copy error
and remove that word from the error message.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 651b81cd648e..4fd997bbf059 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1369,7 +1369,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
vc_finish_insn();
break;
case ES_UNSUPPORTED:
-   pr_err_ratelimited("Unsupported exit-code 0x%02lx in early #VC 
exception (IP: 0x%lx)\n",
+   pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC 
exception (IP: 0x%lx)\n",
   error_code, regs->ip);
goto fail;
case ES_VMM_ERROR:
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/8] x86/sev-es: Fixes for SEV-ES Guest Support

2021-05-19 Thread Joerg Roedel
From: Joerg Roedel 

Hi,

here is the second version of my pending SEV-ES fixes. The most
important patches are patch 1 to 5, as they fix warnings and splats
that trigger with various debugging options are enabled.

Patches 6 to 8 fix a correctness issue in the instruction emulation
part of the #VC exception handler.

Please review.

Thanks,

Joerg

Link to v1: https://lore.kernel.org/lkml/20210512075445.18935-1-j...@8bytes.org/

Changes since v1:

- Documented why __get_user()/__put_user() are safe to use in
  the #VC handlers memory access path.

- Merged the revert into patch 3

- Refactored code in the instruction decoder and added #GP
  reporting when getting the instructions linear address fails.

Joerg Roedel (8):
  x86/sev-es: Don't return NULL from sev_es_get_ghcb()
  x86/sev-es: Forward page-faults which happen during emulation
  x86/sev-es: Use __put_user()/__get_user() for data accesses
  x86/sev-es: Fix error message in runtime #VC handler
  x86/sev-es: Leave NMI-mode before sending signals
  x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
  x86/insn: Extend error reporting from
insn_fetch_from_user[_inatomic]()
  x86/sev-es: Propagate #GP if getting linear instruction address failed

 arch/x86/include/asm/insn-eval.h |   6 +-
 arch/x86/kernel/sev.c| 127 +--
 arch/x86/kernel/umip.c   |  10 +--
 arch/x86/lib/insn-eval.c |  52 -
 4 files changed, 129 insertions(+), 66 deletions(-)


base-commit: a50c5bebc99c525e7fbc059988c6a5ab8680cb76
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: Introduce a new kick interface virtqueue_kick_try()

2021-05-19 Thread Xuan Zhuo
On Wed, 19 May 2021 08:44:01 -0400, Michael S. Tsirkin  wrote:
> On Wed, May 19, 2021 at 07:47:57PM +0800, Xuan Zhuo wrote:
> > Unlike virtqueue_kick(), virtqueue_kick_try() returns true only when the
> > kick is successful. In virtio-net, we want to count the number of kicks.
> > So we need an interface that can perceive whether the kick is actually
> > executed.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/net/virtio_net.c |  8 
> >  drivers/virtio/virtio_ring.c | 20 
> >  include/linux/virtio.h   |  2 ++
> >  3 files changed, 26 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 9b6a4a875c55..167697030cb6 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -617,7 +617,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > ret = nxmit;
> >
> > if (flags & XDP_XMIT_FLUSH) {
> > -   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq))
> > +   if (virtqueue_kick_try(sq->vq))
> > kicks = 1;
> > }
> >  out:
> > @@ -1325,7 +1325,7 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> > struct receive_queue *rq,
> > if (err)
> > break;
> > } while (rq->vq->num_free);
> > -   if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
> > +   if (virtqueue_kick_try(rq->vq)) {
> > unsigned long flags;
> >
> > flags = u64_stats_update_begin_irqsave(>stats.syncp);
> > @@ -1533,7 +1533,7 @@ static int virtnet_poll(struct napi_struct *napi, int 
> > budget)
> >
> > if (xdp_xmit & VIRTIO_XDP_TX) {
> > sq = virtnet_xdp_get_sq(vi);
> > -   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
> > {
> > +   if (virtqueue_kick_try(sq->vq)) {
> > u64_stats_update_begin(>stats.syncp);
> > sq->stats.kicks++;
> > u64_stats_update_end(>stats.syncp);
> > @@ -1710,7 +1710,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> > }
> >
> > if (kick || netif_xmit_stopped(txq)) {
> > -   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
> > {
> > +   if (virtqueue_kick_try(sq->vq)) {
> > u64_stats_update_begin(>stats.syncp);
> > sq->stats.kicks++;
> > u64_stats_update_end(>stats.syncp);
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..1462be756875 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1874,6 +1874,26 @@ bool virtqueue_kick(struct virtqueue *vq)
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_kick);
> >
> > +/**
> > + * virtqueue_kick_try - try update after add_buf
> > + * @vq: the struct virtqueue
> > + *
> > + * After one or more virtqueue_add_* calls, invoke this to kick
> > + * the other side.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue
> > + * operations at the same time (except where noted).
> > + *
> > + * Returns true if kick success, otherwise false.
>
> so what is the difference between this and virtqueue_kick
> which says
>
>  * Returns false if kick failed, otherwise true.

Calling virtqueue_kick, we can only know whether there is a kick failure, we
don't know whether kick is called. It may return true, but kick is not called.

virtqueue_kick_try() returns true only if the kick is successful, so that we can
know whether kick has been called based on the return value, so as to count the
number of kick calls.

Thanks.

>
>
>
>
>
>
>
> > + */
> > +bool virtqueue_kick_try(struct virtqueue *vq)
> > +{
> > +   if (virtqueue_kick_prepare(vq) && virtqueue_notify(vq))
> > +   return true;
> > +   return false;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_kick_try);
> > +
> >  /**
> >   * virtqueue_get_buf - get the next used buffer
> >   * @_vq: the struct virtqueue we're talking about.
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index b1894e0323fa..45cd6a0af24d 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -59,6 +59,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
> >
> >  bool virtqueue_kick(struct virtqueue *vq);
> >
> > +bool virtqueue_kick_try(struct virtqueue *vq);
> > +
> >  bool virtqueue_kick_prepare(struct virtqueue *vq);
> >
> >  bool virtqueue_notify(struct virtqueue *vq);
> > --
> > 2.31.0
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/6] x86/sev-es: Forward page-faults which happen during emulation

2021-05-19 Thread Joerg Roedel
Hi Sean,

On Wed, May 12, 2021 at 05:31:03PM +, Sean Christopherson wrote:
> This got me looking at the flows that "inject" #PF, and I'm pretty sure there
> are bugs in __vc_decode_user_insn() + insn_get_effective_ip().
> 
> Problem #1: __vc_decode_user_insn() assumes a #PF if 
> insn_fetch_from_user_inatomic()
> fails, but the majority of failure cases in insn_get_seg_base() are #GPs, not 
> #PF.
> 
>   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
>   if (!res) {
>   ctxt->fi.vector = X86_TRAP_PF;
>   ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
>   ctxt->fi.cr2= ctxt->regs->ip;
>   return ES_EXCEPTION;
>   }
> 
> Problem #2: Using '0' as an error code means a legitimate effective IP of '0'
> will be misinterpreted as a failure.  Practically speaking, I highly doubt 
> anyone
> will ever actually run code at address 0, but it's technically possible.  The
> most robust approach would be to pass a pointer to @ip and return an actual 
> error
> code.  Using a non-canonical magic value might also work, but that could run 
> afoul
> of future shenanigans like LAM.
> 
>   ip = insn_get_effective_ip(regs);
>   if (!ip)
>   return 0;

Your observations are all correct. I put some changes onto this
patch-set to fix these problems.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio: Introduce a new kick interface virtqueue_kick_try()

2021-05-19 Thread Michael S. Tsirkin
On Wed, May 19, 2021 at 07:47:57PM +0800, Xuan Zhuo wrote:
> Unlike virtqueue_kick(), virtqueue_kick_try() returns true only when the
> kick is successful. In virtio-net, we want to count the number of kicks.
> So we need an interface that can perceive whether the kick is actually
> executed.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c |  8 
>  drivers/virtio/virtio_ring.c | 20 
>  include/linux/virtio.h   |  2 ++
>  3 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9b6a4a875c55..167697030cb6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -617,7 +617,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   ret = nxmit;
>  
>   if (flags & XDP_XMIT_FLUSH) {
> - if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq))
> + if (virtqueue_kick_try(sq->vq))
>   kicks = 1;
>   }
>  out:
> @@ -1325,7 +1325,7 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> struct receive_queue *rq,
>   if (err)
>   break;
>   } while (rq->vq->num_free);
> - if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
> + if (virtqueue_kick_try(rq->vq)) {
>   unsigned long flags;
>  
>   flags = u64_stats_update_begin_irqsave(>stats.syncp);
> @@ -1533,7 +1533,7 @@ static int virtnet_poll(struct napi_struct *napi, int 
> budget)
>  
>   if (xdp_xmit & VIRTIO_XDP_TX) {
>   sq = virtnet_xdp_get_sq(vi);
> - if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
> {
> + if (virtqueue_kick_try(sq->vq)) {
>   u64_stats_update_begin(>stats.syncp);
>   sq->stats.kicks++;
>   u64_stats_update_end(>stats.syncp);
> @@ -1710,7 +1710,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   }
>  
>   if (kick || netif_xmit_stopped(txq)) {
> - if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
> {
> + if (virtqueue_kick_try(sq->vq)) {
>   u64_stats_update_begin(>stats.syncp);
>   sq->stats.kicks++;
>   u64_stats_update_end(>stats.syncp);
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..1462be756875 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1874,6 +1874,26 @@ bool virtqueue_kick(struct virtqueue *vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_kick);
>  
> +/**
> + * virtqueue_kick_try - try update after add_buf
> + * @vq: the struct virtqueue
> + *
> + * After one or more virtqueue_add_* calls, invoke this to kick
> + * the other side.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * Returns true if kick success, otherwise false.

so what is the difference between this and virtqueue_kick
which says

 * Returns false if kick failed, otherwise true.







> + */
> +bool virtqueue_kick_try(struct virtqueue *vq)
> +{
> + if (virtqueue_kick_prepare(vq) && virtqueue_notify(vq))
> + return true;
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_kick_try);
> +
>  /**
>   * virtqueue_get_buf - get the next used buffer
>   * @_vq: the struct virtqueue we're talking about.
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b1894e0323fa..45cd6a0af24d 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -59,6 +59,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
>  
>  bool virtqueue_kick(struct virtqueue *vq);
>  
> +bool virtqueue_kick_try(struct virtqueue *vq);
> +
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
>  
>  bool virtqueue_notify(struct virtqueue *vq);
> -- 
> 2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/6] Revert "x86/sev-es: Handle string port IO to kernel memory properly"

2021-05-19 Thread Joerg Roedel
On Wed, May 12, 2021 at 05:38:29PM +, Sean Christopherson wrote:
> Alternatively, and probably even better, fold this revert into the switch to
> the unchecked version (sounds like those will use kernel-specific flavors?).

I folded this revert into the previous commit. But I kept the
__get_user()/__put_user() calls and just added a comment explaining why
they are used and why it is safe to use them.

After all, even the get_kernel*() functions call __get_user_size() under
the hood.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio: Introduce a new kick interface virtqueue_kick_try()

2021-05-19 Thread Xuan Zhuo
Unlike virtqueue_kick(), virtqueue_kick_try() returns true only when the
kick is successful. In virtio-net, we want to count the number of kicks.
So we need an interface that can perceive whether the kick is actually
executed.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c |  8 
 drivers/virtio/virtio_ring.c | 20 
 include/linux/virtio.h   |  2 ++
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9b6a4a875c55..167697030cb6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -617,7 +617,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
ret = nxmit;
 
if (flags & XDP_XMIT_FLUSH) {
-   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq))
+   if (virtqueue_kick_try(sq->vq))
kicks = 1;
}
 out:
@@ -1325,7 +1325,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct 
receive_queue *rq,
if (err)
break;
} while (rq->vq->num_free);
-   if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
+   if (virtqueue_kick_try(rq->vq)) {
unsigned long flags;
 
flags = u64_stats_update_begin_irqsave(>stats.syncp);
@@ -1533,7 +1533,7 @@ static int virtnet_poll(struct napi_struct *napi, int 
budget)
 
if (xdp_xmit & VIRTIO_XDP_TX) {
sq = virtnet_xdp_get_sq(vi);
-   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
{
+   if (virtqueue_kick_try(sq->vq)) {
u64_stats_update_begin(>stats.syncp);
sq->stats.kicks++;
u64_stats_update_end(>stats.syncp);
@@ -1710,7 +1710,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
if (kick || netif_xmit_stopped(txq)) {
-   if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
{
+   if (virtqueue_kick_try(sq->vq)) {
u64_stats_update_begin(>stats.syncp);
sq->stats.kicks++;
u64_stats_update_end(>stats.syncp);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..1462be756875 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1874,6 +1874,26 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
+/**
+ * virtqueue_kick_try - try update after add_buf
+ * @vq: the struct virtqueue
+ *
+ * After one or more virtqueue_add_* calls, invoke this to kick
+ * the other side.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns true if kick success, otherwise false.
+ */
+bool virtqueue_kick_try(struct virtqueue *vq)
+{
+   if (virtqueue_kick_prepare(vq) && virtqueue_notify(vq))
+   return true;
+   return false;
+}
+EXPORT_SYMBOL_GPL(virtqueue_kick_try);
+
 /**
  * virtqueue_get_buf - get the next used buffer
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b1894e0323fa..45cd6a0af24d 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -59,6 +59,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 
 bool virtqueue_kick(struct virtqueue *vq);
 
+bool virtqueue_kick_try(struct virtqueue *vq);
+
 bool virtqueue_kick_prepare(struct virtqueue *vq);
 
 bool virtqueue_notify(struct virtqueue *vq);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/6] x86/sev-es: Use __put_user()/__get_user

2021-05-19 Thread 'Joerg Roedel'
On Wed, May 12, 2021 at 11:32:35AM +0200, Joerg Roedel wrote:
> On Wed, May 12, 2021 at 10:58:20AM +0200, Juergen Gross wrote:
> > No, those were used before, but commit 9da3f2b7405440 broke Xen's use
> > case. That is why I did commit 1457d8cf7664f.
>
> [...]
>
> Having the distinction between user and kernel memory accesses
> explicitly in the code seems to be the most robust solution.

On the other hand, as I found out today, 9da3f2b7405440 had a short
life-time and got reverted upstream. So using __get_user()/__put_user()
should be fine in this code path. It just deserves a comment explaining
its use here and why pagefault_enable()/disable() is not needed.
Even the get_kernel* helpers use __get_user_size() internally.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio_net: BQL?

2021-05-19 Thread Michael S. Tsirkin
On Mon, May 17, 2021 at 11:43:43AM -0700, Dave Taht wrote:
> Not really related to this patch, but is there some reason why virtio
> has no support for BQL?

So just so you can try it out, I rebased my old patch.
XDP is handled incorrectly by it so we shouldn't apply it as is,
but should be good enough for you to see whether it helps.
Completely untested!

Signed-off-by: Michael S. Tsirkin 



diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7be93ca01650..4bfb682a20b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -556,6 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
kicks = 1;
}
 out:
+   /* TODO: netdev_tx_completed_queue? */
u64_stats_update_begin(>stats.syncp);
sq->stats.bytes += bytes;
sq->stats.packets += packets;
@@ -1376,7 +1377,7 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void free_old_xmit_skbs(struct netdev_queue *txq, struct send_queue 
*sq, bool in_napi)
 {
unsigned int len;
unsigned int packets = 0;
@@ -1406,6 +1407,8 @@ static void free_old_xmit_skbs(struct send_queue *sq, 
bool in_napi)
if (!packets)
return;
 
+   netdev_tx_completed_queue(txq, packets, bytes);
+
u64_stats_update_begin(>stats.syncp);
sq->stats.bytes += bytes;
sq->stats.packets += packets;
@@ -1434,7 +1437,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
if (__netif_tx_trylock(txq)) {
virtqueue_disable_cb(sq->vq);
-   free_old_xmit_skbs(sq, true);
+   free_old_xmit_skbs(txq, sq, true);
 
if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
@@ -1522,7 +1525,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
txq = netdev_get_tx_queue(vi->dev, index);
__netif_tx_lock(txq, raw_smp_processor_id());
virtqueue_disable_cb(sq->vq);
-   free_old_xmit_skbs(sq, true);
+   free_old_xmit_skbs(txq, sq, true);
 
if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
@@ -1606,10 +1609,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
struct net_device *dev)
struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
bool kick = !netdev_xmit_more();
bool use_napi = sq->napi.weight;
+   unsigned int bytes = skb->len;
 
/* Free up any pending old buffers before queueing new ones. */
virtqueue_disable_cb(sq->vq);
-   free_old_xmit_skbs(sq, false);
+   free_old_xmit_skbs(txq, sq, false);
 
if (use_napi && kick)
virtqueue_enable_cb_delayed(sq->vq);
@@ -1638,6 +1642,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
nf_reset_ct(skb);
}
 
+   netdev_tx_sent_queue(txq, bytes);
+
/* If running out of space, stop queue to avoid getting packets that we
 * are then unable to transmit.
 * An alternative would be to force queuing layer to requeue the skb by
@@ -1653,7 +1659,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
if (!use_napi &&
unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
/* More just got used, free them then recheck. */
-   free_old_xmit_skbs(sq, false);
+   free_old_xmit_skbs(txq, sq, false);
if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
netif_start_subqueue(dev, qnum);
virtqueue_disable_cb(sq->vq);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization