Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

2022-05-26 Thread zhenwei pi

Hi, Andrew & Naoya

I would appreciate it if you could give me any hint about the changes of 
memory/memory-failure!


On 5/20/22 15:06, zhenwei pi wrote:

Hi,

I'm trying to recover hardware corrupted page by virtio balloon, the
workflow of this feature like this:

Guest  5.MF -> 6.RVQ FE10.Unpoison page
 /   \/
---+-+--+---
| |  |
 4.MCE7.RVQ BE   9.RVQ Event
  QEMU /   \   /
  3.SIGBUS  8.Remap
 /
+
 |
 +--2.MF
  Host   /
1.HW error

1, HardWare page error occurs randomly.
2, host side handles corrupted page by Memory Failure mechanism, sends
SIGBUS to the user process if early-kill is enabled.
3, QEMU handles SIGBUS, if the address belongs to guest RAM, then:
4, QEMU tries to inject MCE into guest.
5, guest handles memory failure again.

1-5 is already supported for a long time, the next steps are supported
in this patch(also related driver patch):

6, guest balloon driver gets noticed of the corrupted PFN, and sends
request to host side by Recover VQ FrontEnd.
7, QEMU handles request from Recover VQ BackEnd, then:
8, QEMU remaps the corrupted HVA fo fix the memory failure, then:
9, QEMU acks the guest side the result by Recover VQ.
10, guest unpoisons the page if the corrupted page gets recoverd
 successfully.

Test:
This patch set can be tested with QEMU(also in developing):
https://github.com/pizhenwei/qemu/tree/balloon-recover

Emulate MCE by QEMU(guest RAM normal page only, hugepage is not supported):
virsh qemu-monitor-command vm --hmp mce 0 9 0xbdc0 0xd 0x61646678 
0x8c

The guest works fine(on Intel Platinum 8260):
  mce: [Hardware Error]: Machine check events logged
  Memory failure: 0x61646: recovery action for dirty LRU page: Recovered
  virtio_balloon virtio5: recovered pfn 0x61646
  Unpoison: Unpoisoned page 0x61646 by virtio-balloon
  MCE: Killing stress:24502 due to hardware memory corruption fault at 
7f5be2e5a010

And the 'HardwareCorrupted' in /proc/meminfo also shows 0 kB.

About the protocol of virtio balloon recover VQ, it's undefined and in
developing currently:
- 'struct virtio_balloon_recover' defines the structure which is used to
   exchange message between guest and host.
- '__le32 corrupted_pages' in struct virtio_balloon_config is used in the next
   step:
   1, a VM uses RAM of 2M huge page, once a MCE occurs, the 2M becomes
  unaccessible. Reporting 512 * 4K 'corrupted_pages' to the guest, the guest
  has a chance to isolate the 512 pages ahead of time.

   2, after migrating to another host, the corrupted pages are actually 
recovered,
  once the guest gets the 'corrupted_pages' with 0, then the guest could
  unpoison all the poisoned pages which are recorded in the balloon driver.

zhenwei pi (3):
   memory-failure: Introduce memory failure notifier
   mm/memory-failure.c: support reset PTE during unpoison
   virtio_balloon: Introduce memory recover

  drivers/virtio/virtio_balloon.c | 243 
  include/linux/mm.h  |   4 +-
  include/uapi/linux/virtio_balloon.h |  16 ++
  mm/hwpoison-inject.c|   2 +-
  mm/memory-failure.c |  59 ++-
  5 files changed, 315 insertions(+), 9 deletions(-)



--
zhenwei pi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-26 Thread Jason Wang
On Thu, May 26, 2022 at 8:54 PM Parav Pandit  wrote:
>
>
>
> > From: Eugenio Pérez 
> > Sent: Thursday, May 26, 2022 8:44 AM
>
> > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> >
> > that backend feature and userspace can effectively stop the device.
> >
> >
> >
> > This is a must before get virtqueue indexes (base) for live migration,
> >
> > since the device could modify them after userland gets them. There are
> >
> > individual ways to perform that action for some devices
> >
> > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > was no
> >
> > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> >
> >
> >
> > After the return of ioctl with stop != 0, the device MUST finish any
> >
> > pending operations like in flight requests. It must also preserve all
> >
> > the necessary state (the virtqueue vring base plus the possible device
> >
> > specific states) that is required for restoring in the future. The
> >
> > device must not change its configuration after that point.
> >
> >
> >
> > After the return of ioctl with stop == 0, the device can continue
> >
> > processing buffers as long as typical conditions are met (vq is enabled,
> >
> > DRIVER_OK status bit is enabled, etc).
>
> Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to 
> any mechanism in the virtio spec.

We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
means it is expected to implement at least a subset of
VIRTIO_CONFIG_S_STOP.

>
> Why can't we use this ioctl() to indicate driver to start/stop the device 
> instead of driving it through the driver_ok?

So the idea is to add capability that does not exist in the spec. Then
came the stop/resume which can't be done via DRIVER_OK. I think we
should only allow the stop/resume to succeed after DRIVER_OK is set.

> This is in the context of other discussion we had in the LM series.

Do you see any issue that blocks the live migration?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Re: [PATCH 3/3] virtio_balloon: Introduce memory recover

2022-05-26 Thread zhenwei pi

On 5/27/22 03:18, Michael S. Tsirkin wrote:

On Fri, May 20, 2022 at 03:06:48PM +0800, zhenwei pi wrote:

Introduce a new queue 'recover VQ', this allows guest to recover
hardware corrupted page:

Guest  5.MF -> 6.RVQ FE10.Unpoison page
 /   \/
---+-+--+---
| |  |
 4.MCE7.RVQ BE   9.RVQ Event
  QEMU /   \   /
  3.SIGBUS  8.Remap
 /
+
 |
 +--2.MF
  Host   /
1.HW error

The workflow:
1, HardWare page error occurs randomly.
2, host side handles corrupted page by Memory Failure mechanism, sends
SIGBUS to the user process if early-kill is enabled.
3, QEMU handles SIGBUS, if the address belongs to guest RAM, then:
4, QEMU tries to inject MCE into guest.
5, guest handles memory failure again.

1-5 is already supported for a long time, the next steps are supported
in this patch(also related driver patch):
6, guest balloon driver gets noticed of the corrupted PFN, and sends
request to host side by Recover VQ FrontEnd.
7, QEMU handles request from Recover VQ BackEnd, then:
8, QEMU remaps the corrupted HVA fo fix the memory failure, then:
9, QEMU acks the guest side the result by Recover VQ.
10, guest unpoisons the page if the corrupted page gets recoverd
 successfully.

Then the guest fixes the HW page error dynamiclly without rebooting.

Emulate MCE by QEMU, the guest works fine:
  mce: [Hardware Error]: Machine check events logged
  Memory failure: 0x61646: recovery action for dirty LRU page: Recovered
  virtio_balloon virtio5: recovered pfn 0x61646
  Unpoison: Unpoisoned page 0x61646 by virtio-balloon
  MCE: Killing stress:24502 due to hardware memory corruption fault at 
7f5be2e5a010

The 'HardwareCorrupted' in /proc/meminfo also shows 0 kB.

Signed-off-by: zhenwei pi 
---
  drivers/virtio/virtio_balloon.c | 243 
  include/uapi/linux/virtio_balloon.h |  16 ++
  2 files changed, 259 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f4c34a2a6b8e..f9d95d1d8a4d 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -52,6 +52,7 @@ enum virtio_balloon_vq {
VIRTIO_BALLOON_VQ_STATS,
VIRTIO_BALLOON_VQ_FREE_PAGE,
VIRTIO_BALLOON_VQ_REPORTING,
+   VIRTIO_BALLOON_VQ_RECOVER,
VIRTIO_BALLOON_VQ_MAX
  };
  
@@ -59,6 +60,12 @@ enum virtio_balloon_config_read {

VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
  };
  
+/* the request body to commucate with host side */

+struct __virtio_balloon_recover {
+   struct virtio_balloon_recover vbr;
+   __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE];
+};
+



I don't think this idea of passing 32 bit pfns is going to fly.
What is wrong with just passing the pages normally as a s/g list?
this is what is done for the hints at the moment.

neither should you use __virtio types for new functionality
(should all be __le), nor use __virtio for the struct name.




Guest side sends GPA/PFN to host side by passing the pages normally as a 
s/g list, this is OK.


But in this scenario, guest also needs to get 
status(recovered?corrupted?failed to recover?) of page from the host side.


For a normal page(Ex, 4K), the host could return the status quite 
immediately. But for the 2M hugetlb of guest RAM, the host should be 
pending until the guest requests 512*4K to recover. Once the 2M hugepage 
gets recovered(or failed to recover), the host returns 512 PFNs with 
status to guest. There are at most 512 recover requests of a single 2M 
huge page.


For example, the guest tries to recover a corrupted page:
struct scatterlist status_sg, page_sg, *sgs[2];

sg_init_one(_sg, status, sizeof(*status));
sgs[0] = _sg;

p = page_address(page);
sg_init_one(_sg, p, PAGE_SIZE);
sgs[1] = _sg;

virtqueue_add_sgs(recover_vq, sgs, 1, 1, NULL, GFP_ATOMIC);

The host handles 4K recover request on 2M hugepage, this request is 
pending until the full 2M huge page gets recovered(or failed).


To avoid too many pending request in virt queue, I designed as this 
patch(should use __le), passing PFN in request body, using a single IN 
request only.



...

--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -37,6 +37,7 @@
  #define VIRTIO_BALLOON_F_FREE_PAGE_HINT   3 /* VQ to report free pages */
  #define VIRTIO_BALLOON_F_PAGE_POISON  4 /* Guest is using page poisoning */
  #define VIRTIO_BALLOON_F_REPORTING5 /* Page reporting virtqueue */
+#define VIRTIO_BALLOON_F_RECOVER   6 /* Memory recover virtqueue */
  
  /* Size of a PFN in the balloon interface. */

  #define VIRTIO_BALLOON_PFN_SHIFT 12


Please get this feature recorded in the spec with the virtio TC.
They will also ask you to supply 

Re: [PATCH] vhost-vdpa: Fix some error handling path in vhost_vdpa_process_iotlb_msg()

2022-05-26 Thread Michael S. Tsirkin
On Sun, May 22, 2022 at 03:59:01PM +0200, Christophe JAILLET wrote:
> In the error paths introduced by the commit in the Fixes tag, a mutex may
> be left locked.
> Add the correct goto instead of a direct return.
> 
> Fixes: a1468175bb17 ("vhost-vdpa: support ASID based IOTLB API")
> Signed-off-by: Christophe JAILLET 

Acked-by: Michael S. Tsirkin 

> ---
> WARNING: This patch only fixes the goto vs return mix-up in this function.
> However, the 2nd hunk looks really spurious to me. I think that the:
> - return -EINVAL;
> + r = -EINVAL;
> + goto unlock;
> should be done only in the 'if (!iotlb)' block.
> 
> As I don't know this code, I just leave it as-is but draw your attention
> in case this is another bug lurking.
> ---
>  drivers/vhost/vdpa.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 1f1d1c425573..3e86080041fc 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1000,7 +1000,8 @@ static int vhost_vdpa_process_iotlb_msg(struct 
> vhost_dev *dev, u32 asid,
>   if (!as) {
>   dev_err(>dev, "can't find and alloc asid %d\n",
>   asid);
> - return -EINVAL;
> + r = -EINVAL;
> + goto unlock;
>   }
>   iotlb = >iotlb;
>   } else
> @@ -1013,7 +1014,8 @@ static int vhost_vdpa_process_iotlb_msg(struct 
> vhost_dev *dev, u32 asid,
>   }
>   if (!iotlb)
>   dev_err(>dev, "no iotlb for asid %d\n", asid);
> - return -EINVAL;
> + r = -EINVAL;
> + goto unlock;
>   }
>  
>   switch (msg->type) {
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] virtio_balloon: Introduce memory recover

2022-05-26 Thread Michael S. Tsirkin
On Fri, May 20, 2022 at 03:06:48PM +0800, zhenwei pi wrote:
> Introduce a new queue 'recover VQ', this allows guest to recover
> hardware corrupted page:
> 
> Guest  5.MF -> 6.RVQ FE10.Unpoison page
> /   \/
> ---+-+--+---
>| |  |
> 4.MCE7.RVQ BE   9.RVQ Event
>  QEMU /   \   /
>  3.SIGBUS  8.Remap
> /
> +
> |
> +--2.MF
>  Host   /
>1.HW error
> 
> The workflow:
> 1, HardWare page error occurs randomly.
> 2, host side handles corrupted page by Memory Failure mechanism, sends
>SIGBUS to the user process if early-kill is enabled.
> 3, QEMU handles SIGBUS, if the address belongs to guest RAM, then:
> 4, QEMU tries to inject MCE into guest.
> 5, guest handles memory failure again.
> 
> 1-5 is already supported for a long time, the next steps are supported
> in this patch(also related driver patch):
> 6, guest balloon driver gets noticed of the corrupted PFN, and sends
>request to host side by Recover VQ FrontEnd.
> 7, QEMU handles request from Recover VQ BackEnd, then:
> 8, QEMU remaps the corrupted HVA fo fix the memory failure, then:
> 9, QEMU acks the guest side the result by Recover VQ.
> 10, guest unpoisons the page if the corrupted page gets recoverd
> successfully.
> 
> Then the guest fixes the HW page error dynamiclly without rebooting.
> 
> Emulate MCE by QEMU, the guest works fine:
>  mce: [Hardware Error]: Machine check events logged
>  Memory failure: 0x61646: recovery action for dirty LRU page: Recovered
>  virtio_balloon virtio5: recovered pfn 0x61646
>  Unpoison: Unpoisoned page 0x61646 by virtio-balloon
>  MCE: Killing stress:24502 due to hardware memory corruption fault at 
> 7f5be2e5a010
> 
> The 'HardwareCorrupted' in /proc/meminfo also shows 0 kB.
> 
> Signed-off-by: zhenwei pi 
> ---
>  drivers/virtio/virtio_balloon.c | 243 
>  include/uapi/linux/virtio_balloon.h |  16 ++
>  2 files changed, 259 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f4c34a2a6b8e..f9d95d1d8a4d 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -52,6 +52,7 @@ enum virtio_balloon_vq {
>   VIRTIO_BALLOON_VQ_STATS,
>   VIRTIO_BALLOON_VQ_FREE_PAGE,
>   VIRTIO_BALLOON_VQ_REPORTING,
> + VIRTIO_BALLOON_VQ_RECOVER,
>   VIRTIO_BALLOON_VQ_MAX
>  };
>  
> @@ -59,6 +60,12 @@ enum virtio_balloon_config_read {
>   VIRTIO_BALLOON_CONFIG_READ_CMD_ID = 0,
>  };
>  
> +/* the request body to commucate with host side */
> +struct __virtio_balloon_recover {
> + struct virtio_balloon_recover vbr;
> + __virtio32 pfns[VIRTIO_BALLOON_PAGES_PER_PAGE];
> +};
> +


I don't think this idea of passing 32 bit pfns is going to fly.
What is wrong with just passing the pages normally as a s/g list?
this is what is done for the hints at the moment.

neither should you use __virtio types for new functionality
(should all be __le), nor use __virtio for the struct name.



>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> @@ -126,6 +133,16 @@ struct virtio_balloon {
>   /* Free page reporting device */
>   struct virtqueue *reporting_vq;
>   struct page_reporting_dev_info pr_dev_info;
> +
> + /* Memory recover VQ - VIRTIO_BALLOON_F_RECOVER */
> + struct virtqueue *recover_vq;
> + spinlock_t recover_vq_lock;
> + struct notifier_block memory_failure_nb;
> + struct list_head corrupted_page_list;
> + struct list_head recovered_page_list;
> + spinlock_t recover_page_list_lock;
> + struct __virtio_balloon_recover in_vbr;
> + struct work_struct unpoison_memory_work;
>  };
>  
>  static const struct virtio_device_id id_table[] = {
> @@ -494,6 +511,198 @@ static void update_balloon_size_func(struct work_struct 
> *work)
>   queue_work(system_freezable_wq, work);
>  }
>  
> +/*
> + * virtballoon_memory_failure - notified by memory failure, try to fix the
> + *  corrupted page.
> + * The memory failure notifier is designed to call back when the kernel 
> handled
> + * successfully only, WARN_ON_ONCE on the unlikely condition to find out any
> + * error(memory error handling is a best effort, not 100% coverd).

covered

> + */
> +static int virtballoon_memory_failure(struct notifier_block *notifier,
> +   unsigned long pfn, void *parm)
> +{
> + struct virtio_balloon *vb = container_of(notifier, struct 
> virtio_balloon,
> +  memory_failure_nb);
> + struct page *page;
> + struct __virtio_balloon_recover 

Re: [PATCH v3 2/4] vhost-vdpa: introduce STOP backend feature bit

2022-05-26 Thread Dan Carpenter
On Thu, May 26, 2022 at 07:00:06PM +0200, Eugenio Perez Martin wrote:
> > It feels like returning any literal that isn't 1 or 0 should trigger a
> > warning...  I've written that and will check it out tonight.
> >
> 
> I'm not sure this should be so strict, or "literal" does not include pointers?
> 

What I mean in exact terms, is that if you're returning a known value
and the function returns bool then the known value should be 0 or 1.
Don't "return 3;".  This new warning will complain if you return a known
pointer as in "return ".  It won't complain if you return an
unknown pointer "return p;".

> As an experiment, can Smatch be used to count how many times a
> returned pointer is converted to int / bool before returning vs not
> converted?

I'm not super excited to write that code...  :/

> 
> I find Smatch interesting, especially when switching between projects
> frequently. Does it support changing the code like clang-format? To
> offload cognitive load to tools is usually good :).

No.  Coccinelle does that really well though.

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

2022-05-26 Thread Peter Xu
On Wed, May 25, 2022 at 01:16:34PM -0700, Jue Wang wrote:
> The hypervisor _must_ emulate poisons identified in guest physical
> address space (could be transported from the source VM), this is to
> prevent silent data corruption in the guest. With a paravirtual
> approach like this patch series, the hypervisor can clear some of the
> poisoned HVAs knowing for certain that the guest OS has isolated the
> poisoned page. I wonder how much value it provides to the guest if the
> guest and workload are _not_ in a pressing need for the extra KB/MB
> worth of memory.

I'm curious the same on how unpoisoning could help here.  The reasoning
behind would be great material to be mentioned in the next cover letter.

Shouldn't we consider migrating serious workloads off the host already
where there's a sign of more severe hardware issues, instead?

Thanks,

-- 
Peter Xu

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d

2022-05-26 Thread Dan Carpenter
On Thu, May 26, 2022 at 03:28:25PM +0100, Matthew Wilcox wrote:
> On Thu, May 26, 2022 at 11:48:32AM +0300, Dan Carpenter wrote:
> > On Thu, May 26, 2022 at 02:16:34AM +0100, Matthew Wilcox wrote:
> > > Bizarre this started showing up now.  The recent patch was:
> > > 
> > > -   info->alloced += compound_nr(page);
> > > -   inode->i_blocks += BLOCKS_PER_PAGE << compound_order(page);
> > > +   info->alloced += folio_nr_pages(folio);
> > > +   inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio);
> > > 
> > > so it could tell that compound_order() was small, but folio_order()
> > > might be large?
> > 
> > The old code also generates a warning on my test system.  Smatch thinks
> > both compound_order() and folio_order() are 0-255.  I guess because of
> > the "unsigned char compound_order;" in the struct page.
> 
> It'd be nice if we could annotate that as "contains a value between
> 1 and BITS_PER_LONG - PAGE_SHIFT".  Then be able to optionally enable
> a checker that ensures that's true on loads/stores.  Maybe we need a
> language that isn't C :-P  Ada can do this ... I don't think Rust can.

Machine Parsable Comments.  It's a matter of figuring out the best
format and writing the code.

In Smatch, I have table of hard coded return values in the format:
  
https://github.com/error27/smatch/blob/master/smatch_data/db/kernel.return_fixes
I don't have code to handle something like BITS_PER_LONG or PAGE_SHIFT.
To be honest, Smatch code always assumes that PAGE_SIZE is 4096 but I
should actually look it up...  It's not impossible to do.  The GFP_KERNEL
values changed enough so that I eventually made that look up the actual
defines.

I also have a table in the database where I could edit the values of
(struct page)->compound_order.

regards,
dan carpenter
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d

2022-05-26 Thread Matthew Wilcox
On Thu, May 26, 2022 at 11:48:32AM +0300, Dan Carpenter wrote:
> On Thu, May 26, 2022 at 02:16:34AM +0100, Matthew Wilcox wrote:
> > Bizarre this started showing up now.  The recent patch was:
> > 
> > -   info->alloced += compound_nr(page);
> > -   inode->i_blocks += BLOCKS_PER_PAGE << compound_order(page);
> > +   info->alloced += folio_nr_pages(folio);
> > +   inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio);
> > 
> > so it could tell that compound_order() was small, but folio_order()
> > might be large?
> 
> The old code also generates a warning on my test system.  Smatch thinks
> both compound_order() and folio_order() are 0-255.  I guess because of
> the "unsigned char compound_order;" in the struct page.

It'd be nice if we could annotate that as "contains a value between
1 and BITS_PER_LONG - PAGE_SHIFT".  Then be able to optionally enable
a checker that ensures that's true on loads/stores.  Maybe we need a
language that isn't C :-P  Ada can do this ... I don't think Rust can.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 4/4] vdpa_sim: Implement stop vdpa op

2022-05-26 Thread Stefano Garzarella

On Thu, May 26, 2022 at 02:43:38PM +0200, Eugenio Pérez wrote:

Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
that backend feature and userspace can effectively stop the device.

This is a must before get virtqueue indexes (base) for live migration,
since the device could modify them after userland gets them. There are
individual ways to perform that action for some devices
(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
way to perform it for any vhost device (and, in particular, vhost-vdpa).

Signed-off-by: Eugenio Pérez 
---
drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +
drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
4 files changed, 28 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 50d721072beb..0515cf314bed 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -107,6 +107,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
for (i = 0; i < vdpasim->dev_attr.nas; i++)
vhost_iotlb_reset(>iommu[i]);

+   vdpasim->running = true;
spin_unlock(>iommu_lock);

vdpasim->features = 0;
@@ -505,6 +506,24 @@ static int vdpasim_reset(struct vdpa_device *vdpa)
return 0;
}

+static int vdpasim_stop(struct vdpa_device *vdpa, bool stop)
+{
+   struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+   int i;
+
+   spin_lock(>lock);
+   vdpasim->running = !stop;
+   if (vdpasim->running) {
+   /* Check for missed buffers */
+   for (i = 0; i < vdpasim->dev_attr.nvqs; ++i)
+   vdpasim_kick_vq(vdpa, i);
+
+   }
+   spin_unlock(>lock);
+
+   return 0;
+}
+
static size_t vdpasim_get_config_size(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@@ -694,6 +713,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.get_status = vdpasim_get_status,
.set_status = vdpasim_set_status,
.reset  = vdpasim_reset,
+   .stop   = vdpasim_stop,
.get_config_size= vdpasim_get_config_size,
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
@@ -726,6 +746,7 @@ static const struct vdpa_config_ops 
vdpasim_batch_config_ops = {
.get_status = vdpasim_get_status,
.set_status = vdpasim_set_status,
.reset  = vdpasim_reset,
+   .stop   = vdpasim_stop,
.get_config_size= vdpasim_get_config_size,
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 622782e92239..061986f30911 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -66,6 +66,7 @@ struct vdpasim {
u32 generation;
u64 features;
u32 groups;
+   bool running;
/* spinlock to synchronize iommu table */
spinlock_t iommu_lock;
};
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 42d401d43911..bcdb1982c378 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -204,6 +204,9 @@ static void vdpasim_blk_work(struct work_struct *work)
if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
goto out;

+   if (!vdpasim->running)
+   goto out;
+
for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
struct vdpasim_virtqueue *vq = >vqs[i];

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 5125976a4df8..886449e88502 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -154,6 +154,9 @@ static void vdpasim_net_work(struct work_struct *work)

spin_lock(>lock);

+   if (!vdpasim->running)
+   goto out;
+
if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
goto out;

--
2.31.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/4] vdpa: Add stop operation

2022-05-26 Thread Stefano Garzarella

On Thu, May 26, 2022 at 02:43:35PM +0200, Eugenio Pérez wrote:

This operation is optional: It it's not implemented, backend feature bit
will not be exposed.

Signed-off-by: Eugenio Pérez 
---
include/linux/vdpa.h | 6 ++
1 file changed, 6 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 15af802d41c4..ddfebc4e1e01 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -215,6 +215,11 @@ struct vdpa_map_file {
 * @reset:  Reset device
 *  @vdev: vdpa device
 *  Returns integer: success (0) or error (< 0)
+ * @stop:  Stop or resume the device (optional, but it must
+ * be implemented if require device stop)
+ * @vdev: vdpa device
+ * @stop: stop (true), not stop (false)


Sorry for just seeing this now, but if you have to send a v5, maybe we 
could use "resume" here instead of "not stop".


Thanks,
Stefano


+ * Returns integer: success (0) or error (< 0)
 * @get_config_size:Get the size of the configuration space includes
 *  fields that are conditional on feature bits.
 *  @vdev: vdpa device
@@ -316,6 +321,7 @@ struct vdpa_config_ops {
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
int (*reset)(struct vdpa_device *vdev);
+   int (*stop)(struct vdpa_device *vdev, bool stop);
size_t (*get_config_size)(struct vdpa_device *vdev);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
   void *buf, unsigned int len);
--
2.31.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 9/9] crypto: Introduce RSA algorithm

2022-05-26 Thread Gonglei (Arei) via Virtualization



> -Original Message-
> From: Lei He [mailto:helei.si...@bytedance.com]
> Sent: Wednesday, May 25, 2022 5:01 PM
> To: m...@redhat.com; Gonglei (Arei) ;
> berra...@redhat.com
> Cc: qemu-de...@nongnu.org; virtualization@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; jasow...@redhat.com; coh...@redhat.com;
> pizhen...@bytedance.com; helei.si...@bytedance.com
> Subject: [PATCH 9/9] crypto: Introduce RSA algorithm
> 
> From: zhenwei pi 
> 
> There are two parts in this patch:
> 1, support akcipher service by cryptodev-builtin driver 2, virtio-crypto 
> driver
> supports akcipher service
> 
> In principle, we should separate this into two patches, to avoid compiling 
> error,
> merge them into one.
> 
> Then virtio-crypto gets request from guest side, and forwards the request to
> builtin driver to handle it.
> 
> Test with a guest linux:
> 1, The self-test framework of crypto layer works fine in guest kernel 2, Test
> with Linux guest(with asym support), the following script test(note that
> pkey_XXX is supported only in a newer version of keyutils):
>   - both public key & private key
>   - create/close session
>   - encrypt/decrypt/sign/verify basic driver operation
>   - also test with kernel crypto layer(pkey add/query)
> 
> All the cases work fine.
> 
> Run script in guest:
> rm -rf *.der *.pem *.pfx
> modprobe pkcs8_key_parser # if CONFIG_PKCS8_PRIVATE_KEY_PARSER=m rm
> -rf /tmp/data dd if=/dev/random of=/tmp/data count=1 bs=20
> 
> openssl req -nodes -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -subj
> "/C=CN/ST=BJ/L=HD/O=qemu/OU=dev/CN=qemu/emailAddress=qemu@qemu
> .org"
> openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER -out key.der openssl
> x509 -in cert.pem -inform PEM -outform DER -out cert.der
> 
> PRIV_KEY_ID=`cat key.der | keyctl padd asymmetric test_priv_key @s` echo
> "priv key id = "$PRIV_KEY_ID PUB_KEY_ID=`cat cert.der | keyctl padd
> asymmetric test_pub_key @s` echo "pub key id = "$PUB_KEY_ID
> 
> keyctl pkey_query $PRIV_KEY_ID 0
> keyctl pkey_query $PUB_KEY_ID 0
> 
> echo "Enc with priv key..."
> keyctl pkey_encrypt $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.priv echo
> "Dec with pub key..."
> keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.priv enc=pkcs1 >/tmp/dec cmp
> /tmp/data /tmp/dec
> 
> echo "Sign with priv key..."
> keyctl pkey_sign $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 hash=sha1 > /tmp/sig
> echo "Verify with pub key..."
> keyctl pkey_verify $PRIV_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
> 
> echo "Enc with pub key..."
> keyctl pkey_encrypt $PUB_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.pub echo
> "Dec with priv key..."
> keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.pub enc=pkcs1 >/tmp/dec cmp
> /tmp/data /tmp/dec
> 
> echo "Verify with pub key..."
> keyctl pkey_verify $PUB_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
> 
> Signed-off-by: zhenwei pi 
> Signed-off-by: lei he  ---
>  backends/cryptodev-builtin.c  | 272
> +++-
>  backends/cryptodev-vhost-user.c   |  34 +++-
>  backends/cryptodev.c  |  32 ++--
>  hw/virtio/virtio-crypto.c | 323
> ++
>  include/hw/virtio/virtio-crypto.h |   5 +-
>  include/sysemu/cryptodev.h|  83 --
>  6 files changed, 604 insertions(+), 145 deletions(-)
> 
> diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c index
> 0671bf9f3e..388aedd8df 100644
> --- a/backends/cryptodev-builtin.c
> +++ b/backends/cryptodev-builtin.c
> @@ -26,6 +26,7 @@
>  #include "qapi/error.h"
>  #include "standard-headers/linux/virtio_crypto.h"
>  #include "crypto/cipher.h"
> +#include "crypto/akcipher.h"
>  #include "qom/object.h"
> 
> 
> @@ -41,11 +42,12 @@
> OBJECT_DECLARE_SIMPLE_TYPE(CryptoDevBackendBuiltin,
> CRYPTODEV_BACKEND_BUILTIN)  typedef struct
> CryptoDevBackendBuiltinSession {
>  QCryptoCipher *cipher;
>  uint8_t direction; /* encryption or decryption */
> -uint8_t type; /* cipher? hash? aead? */
> +uint8_t type; /* cipher? hash? aead? akcipher? */

Do you actually use the type for akcipher?

> +QCryptoAkCipher *akcipher;
>  QTAILQ_ENTRY(CryptoDevBackendBuiltinSession) next;  }
> CryptoDevBackendBuiltinSession;
> 
> -/* Max number of symmetric sessions */
> +/* Max number of symmetric/asymmetric sessions */
>  #define MAX_NUM_SESSIONS 256
> 
>  #define CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN512
> @@ -80,15 +82,17 @@ static void cryptodev_builtin_init(
>  backend->conf.crypto_services =
>   1u << VIRTIO_CRYPTO_SERVICE_CIPHER |
>   1u << VIRTIO_CRYPTO_SERVICE_HASH |
> - 1u << VIRTIO_CRYPTO_SERVICE_MAC;
> + 1u << VIRTIO_CRYPTO_SERVICE_MAC |
> + 1u << VIRTIO_CRYPTO_SERVICE_AKCIPHER;
>  backend->conf.cipher_algo_l = 1u << VIRTIO_CRYPTO_CIPHER_AES_CBC;
>  backend->conf.hash_algo = 1u << VIRTIO_CRYPTO_HASH_SHA1;
> +backend->conf.akcipher_algo = 1u << 

Re: [PATCH v3 2/4] vhost-vdpa: introduce STOP backend feature bit

2022-05-26 Thread Dan Carpenter
On Thu, May 26, 2022 at 02:44:02PM +0200, Eugenio Perez Martin wrote:
> > >> +static bool vhost_vdpa_can_stop(const struct vhost_vdpa *v) {
> > >> +   struct vdpa_device *vdpa = v->vdpa;
> > >> +   const struct vdpa_config_ops *ops = vdpa->config;
> > >> +
> > >> +   return ops->stop;
> > >> [GD>>] Would it be better to explicitly return a bool to match the 
> > >> return type?
> > >
> > >I'm not sure about the kernel code style regarding that casting. Maybe
> > >it's better to return !!ops->stop here. The macros likely and unlikely
> > >do that.
> >
> > IIUC `ops->stop` is a function pointer, so what about
> >
> >  return ops->stop != NULL;
> >
> 
> I'm ok with any method proposed. Both three ways can be found in the
> kernel so I think they are all valid (although the double negation is
> from bool to integer in (0,1) set actually).
> 
> Maybe Jason or Michael (as maintainers) can state the preferred method here.

Always just do whatever the person who responded feels like because
they're likely the person who cares the most.  ;)

I don't think there are any static analysis tools which will complain
about this.  Smatch will complain if you return a negative literal.
It feels like returning any literal that isn't 1 or 0 should trigger a
warning...  I've written that and will check it out tonight.

Really anything negative should trigger a warning.  See new Smatch stuff
below.

regards,
dan carpenter

 TEST CASE =

int x;
_Bool one(int *p)
{
if (p)
x = -2;
return x;
}
_Bool two(int *p)
{
return -4;  // returning 2 triggers a warning now
}

=== OUTPUT =

test.c:10 one() warn: potential negative cast to bool 'x'
test.c:14 two() warn: signedness bug returning '(-4)'
test.c:14 two() warn: '(-4)' is not bool

=== CODE ===

#include "smatch.h"
#include "smatch_extra.h"
#include "smatch_slist.h"

static int my_id;

static void match_literals(struct expression *ret_value)
{
struct symbol *type;
sval_t sval;

type = cur_func_return_type();
if (!type || sval_type_max(type).value != 1)
return;

if (!get_implied_value(ret_value, ))
return;

if (sval.value == 0 || sval.value == 1)
return;

sm_warning("'%s' is not bool", sval_to_str(sval));
}

static void match_any_negative(struct expression *ret_value)
{
struct symbol *type;
struct sm_state *extra, *sm;
sval_t sval;
char *name;

type = cur_func_return_type();
if (!type || sval_type_max(type).value != 1)
return;

extra = get_extra_sm_state(ret_value);
if (!extra)
return;
FOR_EACH_PTR(extra->possible, sm) {
if (estate_get_single_value(sm->state, ) &&
sval_is_negative(sval)) {
name = expr_to_str(ret_value);
sm_warning("potential negative cast to bool '%s'", 
name);
free_string(name);
return;
}
} END_FOR_EACH_PTR(sm);
}

void check_bool_return(int id)
{
my_id = id;

add_hook(_literals, RETURN_HOOK);
add_hook(_any_negative, RETURN_HOOK);
}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-26 Thread Parav Pandit via Virtualization


> From: Eugenio Pérez 
> Sent: Thursday, May 26, 2022 8:44 AM

> Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> 
> that backend feature and userspace can effectively stop the device.
> 
> 
> 
> This is a must before get virtqueue indexes (base) for live migration,
> 
> since the device could modify them after userland gets them. There are
> 
> individual ways to perform that action for some devices
> 
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> was no
> 
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
> 
> 
> 
> After the return of ioctl with stop != 0, the device MUST finish any
> 
> pending operations like in flight requests. It must also preserve all
> 
> the necessary state (the virtqueue vring base plus the possible device
> 
> specific states) that is required for restoring in the future. The
> 
> device must not change its configuration after that point.
> 
> 
> 
> After the return of ioctl with stop == 0, the device can continue
> 
> processing buffers as long as typical conditions are met (vq is enabled,
> 
> DRIVER_OK status bit is enabled, etc).

Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any 
mechanism in the virtio spec.

Why can't we use this ioctl() to indicate driver to start/stop the device 
instead of driving it through the driver_ok?
This is in the context of other discussion we had in the LM series.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.

2022-05-26 Thread Andrew Melnichenko
I'll check it, thank you!

On Thu, May 26, 2022 at 9:56 AM Jason Wang  wrote:
>
> On Tue, May 24, 2022 at 7:07 PM Andrew Melnichenko  wrote:
> >
> > Hi all,
> >
> > The issue is that host segments packets between guests on the same host.
> > Tests show that it happens because SKB_GSO_DODGY skb offload in
> > virtio_net_hdr_from_skb().
> > To do segmentation you need to remove SKB_GSO_DODGY or add SKB_GSO_PARTIAL
> > The solution with DODGY/PARTIAL offload looks like a dirty hack, so
> > for now, I've lived it as it is for further investigation.
>
> Ok, I managed to find the previous discussion. It looks to me the
> reason is that __udp_gso_segment will segment dodgy packets
> unconditionally.
>
> I wonder if the attached patch works? (compile test only).
>
> Thanks
>
> >
> >
> > On Tue, May 17, 2022 at 9:32 AM Jason Wang  wrote:
> > >
> > > On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko  
> > > wrote:
> > > >
> > > > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> > > > Technically they enable NETIF_F_GSO_UDP_L4
> > > > (and only if USO4 & USO6 are set simultaneously).
> > > > It allows to transmission of large UDP packets.
> > > >
> > > > Different features USO4 and USO6 are required for qemu where Windows 
> > > > guests can
> > > > enable disable USO receives for IPv4 and IPv6 separately.
> > > > On the other side, Linux can't really differentiate USO4 and USO6, for 
> > > > now.
> > > > For now, to enable USO for TUN it requires enabling USO4 and USO6 
> > > > together.
> > > > In the future, there would be a mechanism to control UDP_L4 GSO 
> > > > separately.
> > > >
> > > > Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
> > > >
> > > > New types for VirtioNet already on mailing:
> > > > https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
> > > >
> > > > Also, there is a known issue with transmitting packages between two 
> > > > guests.
> > >
> > > Could you explain this more? It looks like a bug. (Or any pointer to
> > > the discussion)
> > >
> > > Thanks
> > >
> > > > Without hacks with skb's GSO - packages are still segmented on the 
> > > > host's postrouting.
> > > >
> > > > Andrew (5):
> > > >   uapi/linux/if_tun.h: Added new offload types for USO4/6.
> > > >   driver/net/tun: Added features for USO.
> > > >   uapi/linux/virtio_net.h: Added USO types.
> > > >   linux/virtio_net.h: Support USO offload in vnet header.
> > > >   drivers/net/virtio_net.c: Added USO support.
> > > >
> > > >  drivers/net/tap.c   | 10 --
> > > >  drivers/net/tun.c   |  8 +++-
> > > >  drivers/net/virtio_net.c| 19 +++
> > > >  include/linux/virtio_net.h  |  9 +
> > > >  include/uapi/linux/if_tun.h |  2 ++
> > > >  include/uapi/linux/virtio_net.h |  4 
> > > >  6 files changed, 45 insertions(+), 7 deletions(-)
> > > >
> > > > --
> > > > 2.35.1
> > > >
> > >
> >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Re: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto

2022-05-26 Thread zhenwei pi

Hi, Daniel

Thanks a lot for your review!

On 5/26/22 18:48, Daniel P. Berrangé wrote:

I've sent a pull request containing all the crypto/ changes,
as that covers stuff I maintain. ie patches 2-8

Patches 1 and 9, I'll leave for MST to review & queue since the
virtual hardware is not my area of knowledge.

On Wed, May 25, 2022 at 05:01:09PM +0800, Lei He wrote:

v6 -> v7:
- Fix serval build errors for some specific platforms/configurations.
- Use '%zu' instead of '%lu' for size_t parameters.
- AkCipher-gcrypt: avoid setting wrong error messages when parsing RSA
   keys.
- AkCipher-benchmark: process constant amount of sign/verify instead
  of running sign/verify for a constant duration.

v5 -> v6:
- Fix build errors and codestyles.
- Add parameter 'Error **errp' for qcrypto_akcipher_rsakey_parse.
- Report more detailed errors.
- Fix buffer length check and return values of akcipher-nettle, allows caller to
  pass a buffer with larger size than actual needed.

A million thanks to Daniel!

v4 -> v5:
- Move QCryptoAkCipher into akcipherpriv.h, and modify the related comments.
- Rename asn1_decoder.c to der.c.
- Code style fix: use 'cleanup' & 'error' lables.
- Allow autoptr type to auto-free.
- Add test cases for rsakey to handle DER error.
- Other minor fixes.

v3 -> v4:
- Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA,
XXX-alg -> XXX-algo.
- Change version info in qapi/crypto.json, from 7.0 -> 7.1.
- Remove ecdsa from qapi/crypto.json, it would be introduced with the 
implemetion later.
- Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in 
qapi/crypto.json.
- Rename arguments of qcrypto_akcipher_XXX to keep aligned with 
qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add 
qcrypto_akcipher_max_XXX APIs.
- Add new API: qcrypto_akcipher_supports.
- Change the return value of qcrypto_akcipher_enc/dec/sign, these functions 
return the actual length of result.
- Separate ASN.1 source code and test case clean.
- Disable RSA raw encoding for akcipher-nettle.
- Separate RSA key parser into rsakey.{hc}, and implememts it with 
builtin-asn1-decoder and nettle respectivly.
- Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has higher 
priority than nettle.
- For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length of 
returned result maybe less than the dst buffer size, return the actual length 
of result instead of the buffer length to the guest side. (in function 
virtio_crypto_akcipher_input_data_helper)
- Other minor changes.

Thanks to Daniel!

Eric pointed out this missing part of use case, send it here again.

In our plan, the feature is designed for HTTPS offloading case and other 
applications which use kernel RSA/ecdsa by keyctl syscall. The full picture 
shows bellow:


  Nginx/openssl[1] ... Apps
Guest   -
   virtio-crypto driver[2]
-
   virtio-crypto backend[3]
Host-
  /  |  \
  builtin[4]   vhost keyctl[5] ...


[1] User applications can offload RSA calculation to kernel by keyctl syscall. 
There is no keyctl engine in openssl currently, we developed a engine and tried 
to contribute it to openssl upstream, but openssl 1.x does not accept new 
feature. Link:
https://github.com/openssl/openssl/pull/16689

This branch is available and maintained by Lei 
https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine

We tested nginx(change config file only) with openssl keyctl engine, it works 
fine.

[2] virtio-crypto driver is used to communicate with host side, send requests 
to host side to do asymmetric calculation.
https://lkml.org/lkml/2022/3/1/1425

[3] virtio-crypto backend handles requests from guest side, and forwards 
request to crypto backend driver of QEMU.

[4] Currently RSA is supported only in builtin driver. This driver is supposed 
to test the full feature without other software(Ex vhost process) and hardware 
dependence. ecdsa is introduced into qapi type without implementation, this may 
be implemented in Q3-2022 or later. If ecdsa type definition should be added 
with the implementation together, I'll remove this in next version.

[5] keyctl backend is in development, we will post this feature in Q2-2022. 
keyctl backend can use hardware acceleration(Ex, Intel QAT).

Setup the full environment, tested with Intel QAT on host side, the QPS of 
HTTPS increase to ~200% in a guest.

VS PCI passthrough: the most important benefit of this solution makes the VM 
migratable.

v2 -> v3:
- Introduce akcipher types to qapi
- Add test/benchmark suite for akcipher class
- Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
  - crypto: Introduce akcipher crypto class
  - virtio-crypto: Introduce RSA algorithm

v1 -> v2:
- Update virtio_crypto.h from v2 

RE: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto

2022-05-26 Thread Gonglei (Arei) via Virtualization


> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Thursday, May 26, 2022 6:48 PM
> To: Lei He 
> Cc: m...@redhat.com; Gonglei (Arei) ;
> qemu-de...@nongnu.org; virtualization@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; jasow...@redhat.com; coh...@redhat.com;
> pizhen...@bytedance.com
> Subject: Re: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto
> 
> I've sent a pull request containing all the crypto/ changes, as that covers 
> stuff I
> maintain. ie patches 2-8
> 
> Patches 1 and 9, I'll leave for MST to review & queue since the virtual 
> hardware
> is not my area of knowledge.
> 

Thanks for your work, Daniel.

Regards,
-Gonglei

> On Wed, May 25, 2022 at 05:01:09PM +0800, Lei He wrote:
> > v6 -> v7:
> > - Fix serval build errors for some specific platforms/configurations.
> > - Use '%zu' instead of '%lu' for size_t parameters.
> > - AkCipher-gcrypt: avoid setting wrong error messages when parsing RSA
> >   keys.
> > - AkCipher-benchmark: process constant amount of sign/verify instead
> > of running sign/verify for a constant duration.
> >
> > v5 -> v6:
> > - Fix build errors and codestyles.
> > - Add parameter 'Error **errp' for qcrypto_akcipher_rsakey_parse.
> > - Report more detailed errors.
> > - Fix buffer length check and return values of akcipher-nettle, allows
> > caller to  pass a buffer with larger size than actual needed.
> >
> > A million thanks to Daniel!
> >
> > v4 -> v5:
> > - Move QCryptoAkCipher into akcipherpriv.h, and modify the related
> comments.
> > - Rename asn1_decoder.c to der.c.
> > - Code style fix: use 'cleanup' & 'error' lables.
> > - Allow autoptr type to auto-free.
> > - Add test cases for rsakey to handle DER error.
> > - Other minor fixes.
> >
> > v3 -> v4:
> > - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa ->
> > RSA, XXX-alg -> XXX-algo.
> > - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> > - Remove ecdsa from qapi/crypto.json, it would be introduced with the
> implemetion later.
> > - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed)
> in qapi/crypto.json.
> > - Rename arguments of qcrypto_akcipher_XXX to keep aligned with
> qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add
> qcrypto_akcipher_max_XXX APIs.
> > - Add new API: qcrypto_akcipher_supports.
> > - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions
> return the actual length of result.
> > - Separate ASN.1 source code and test case clean.
> > - Disable RSA raw encoding for akcipher-nettle.
> > - Separate RSA key parser into rsakey.{hc}, and implememts it with
> builtin-asn1-decoder and nettle respectivly.
> > - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has
> higher priority than nettle.
> > - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the
> > length of returned result maybe less than the dst buffer size, return
> > the actual length of result instead of the buffer length to the guest
> > side. (in function virtio_crypto_akcipher_input_data_helper)
> > - Other minor changes.
> >
> > Thanks to Daniel!
> >
> > Eric pointed out this missing part of use case, send it here again.
> >
> > In our plan, the feature is designed for HTTPS offloading case and other
> applications which use kernel RSA/ecdsa by keyctl syscall. The full picture
> shows bellow:
> >
> >
> >  Nginx/openssl[1] ... Apps
> > Guest   -
> >   virtio-crypto driver[2]
> > -
> >   virtio-crypto backend[3]
> > Host-
> >  /  |  \
> >  builtin[4]   vhost keyctl[5] ...
> >
> >
> > [1] User applications can offload RSA calculation to kernel by keyctl 
> > syscall.
> There is no keyctl engine in openssl currently, we developed a engine and 
> tried
> to contribute it to openssl upstream, but openssl 1.x does not accept new
> feature. Link:
> >https://github.com/openssl/openssl/pull/16689
> >
> > This branch is available and maintained by Lei 
> >
> > https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> >
> > We tested nginx(change config file only) with openssl keyctl engine, it 
> > works
> fine.
> >
> > [2] virtio-crypto driver is used to communicate with host side, send 
> > requests
> to host side to do asymmetric calculation.
> >https://lkml.org/lkml/2022/3/1/1425
> >
> > [3] virtio-crypto backend handles requests from guest side, and forwards
> request to crypto backend driver of QEMU.
> >
> > [4] Currently RSA is supported only in builtin driver. This driver is 
> > supposed to
> test the full feature without other software(Ex vhost process) and hardware
> dependence. ecdsa is introduced into qapi type without implementation, this
> may be implemented in Q3-2022 or later. If ecdsa type definition 

Re: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto

2022-05-26 Thread Daniel P . Berrangé
I've sent a pull request containing all the crypto/ changes,
as that covers stuff I maintain. ie patches 2-8

Patches 1 and 9, I'll leave for MST to review & queue since the
virtual hardware is not my area of knowledge.

On Wed, May 25, 2022 at 05:01:09PM +0800, Lei He wrote:
> v6 -> v7:
> - Fix serval build errors for some specific platforms/configurations.
> - Use '%zu' instead of '%lu' for size_t parameters.
> - AkCipher-gcrypt: avoid setting wrong error messages when parsing RSA
>   keys.
> - AkCipher-benchmark: process constant amount of sign/verify instead
>  of running sign/verify for a constant duration.
> 
> v5 -> v6:
> - Fix build errors and codestyles.
> - Add parameter 'Error **errp' for qcrypto_akcipher_rsakey_parse.
> - Report more detailed errors.
> - Fix buffer length check and return values of akcipher-nettle, allows caller 
> to
>  pass a buffer with larger size than actual needed.
> 
> A million thanks to Daniel!
> 
> v4 -> v5:
> - Move QCryptoAkCipher into akcipherpriv.h, and modify the related comments.
> - Rename asn1_decoder.c to der.c.
> - Code style fix: use 'cleanup' & 'error' lables.
> - Allow autoptr type to auto-free.
> - Add test cases for rsakey to handle DER error.
> - Other minor fixes.
> 
> v3 -> v4:
> - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA,
> XXX-alg -> XXX-algo.
> - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> - Remove ecdsa from qapi/crypto.json, it would be introduced with the 
> implemetion later.
> - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in 
> qapi/crypto.json.
> - Rename arguments of qcrypto_akcipher_XXX to keep aligned with 
> qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add 
> qcrypto_akcipher_max_XXX APIs.
> - Add new API: qcrypto_akcipher_supports.
> - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions 
> return the actual length of result.
> - Separate ASN.1 source code and test case clean.
> - Disable RSA raw encoding for akcipher-nettle.
> - Separate RSA key parser into rsakey.{hc}, and implememts it with 
> builtin-asn1-decoder and nettle respectivly.
> - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has higher 
> priority than nettle.
> - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length 
> of returned result maybe less than the dst buffer size, return the actual 
> length of result instead of the buffer length to the guest side. (in function 
> virtio_crypto_akcipher_input_data_helper)
> - Other minor changes.
> 
> Thanks to Daniel!
> 
> Eric pointed out this missing part of use case, send it here again.
> 
> In our plan, the feature is designed for HTTPS offloading case and other 
> applications which use kernel RSA/ecdsa by keyctl syscall. The full picture 
> shows bellow:
> 
> 
>  Nginx/openssl[1] ... Apps
> Guest   -
>   virtio-crypto driver[2]
> -
>   virtio-crypto backend[3]
> Host-
>  /  |  \
>  builtin[4]   vhost keyctl[5] ...
> 
> 
> [1] User applications can offload RSA calculation to kernel by keyctl 
> syscall. There is no keyctl engine in openssl currently, we developed a 
> engine and tried to contribute it to openssl upstream, but openssl 1.x does 
> not accept new feature. Link:
>https://github.com/openssl/openssl/pull/16689
> 
> This branch is available and maintained by Lei 
>https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> 
> We tested nginx(change config file only) with openssl keyctl engine, it works 
> fine.
> 
> [2] virtio-crypto driver is used to communicate with host side, send requests 
> to host side to do asymmetric calculation.
>https://lkml.org/lkml/2022/3/1/1425
> 
> [3] virtio-crypto backend handles requests from guest side, and forwards 
> request to crypto backend driver of QEMU.
> 
> [4] Currently RSA is supported only in builtin driver. This driver is 
> supposed to test the full feature without other software(Ex vhost process) 
> and hardware dependence. ecdsa is introduced into qapi type without 
> implementation, this may be implemented in Q3-2022 or later. If ecdsa type 
> definition should be added with the implementation together, I'll remove this 
> in next version.
> 
> [5] keyctl backend is in development, we will post this feature in Q2-2022. 
> keyctl backend can use hardware acceleration(Ex, Intel QAT).
> 
> Setup the full environment, tested with Intel QAT on host side, the QPS of 
> HTTPS increase to ~200% in a guest.
> 
> VS PCI passthrough: the most important benefit of this solution makes the VM 
> migratable.
> 
> v2 -> v3:
> - Introduce akcipher types to qapi
> - Add test/benchmark suite for akcipher class
> - Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
>  - 

Re: [PATCH v3 2/4] vhost-vdpa: introduce STOP backend feature bit

2022-05-26 Thread Stefano Garzarella

On Thu, May 26, 2022 at 10:57:03AM +0200, Eugenio Perez Martin wrote:

On Wed, May 25, 2022 at 1:23 PM Dawar, Gautam  wrote:


[AMD Official Use Only - General]

-Original Message-
From: Eugenio Pérez 
Sent: Wednesday, May 25, 2022 4:29 PM
To: Michael S. Tsirkin ; net...@vger.kernel.org; 
linux-ker...@vger.kernel.org; k...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; Jason Wang 
Cc: Zhu Lingshan ; mart...@xilinx.com; Stefano Garzarella ; ecree.xil...@gmail.com; Eli Cohen 
; Dan Carpenter ; Parav Pandit ; Wu Zongyong 
; din...@xilinx.com; Christophe JAILLET ; Xie Yongji 
; Dawar, Gautam ; l...@redhat.com; marti...@xilinx.com; pab...@xilinx.com; Longpeng 
; piotr.umin...@intel.com; Kamde, Tanuj ; Si-Wei Liu ; 
habetsm.xil...@gmail.com; lviv...@redhat.com; Zhang Min ; han...@xilinx.com
Subject: [PATCH v3 2/4] vhost-vdpa: introduce STOP backend feature bit

[CAUTION: External Email]

Userland knows if it can stop the device or not by checking this feature bit.

It's only offered if the vdpa driver backend implements the stop() operation 
callback, and try to set it if the backend does not offer that callback is an 
error.

Signed-off-by: Eugenio Pérez 
---
 drivers/vhost/vdpa.c | 16 +++-
 include/uapi/linux/vhost_types.h |  2 ++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 
1f1d1c425573..32713db5831d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -347,6 +347,14 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v,
return 0;
 }

+static bool vhost_vdpa_can_stop(const struct vhost_vdpa *v) {
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   return ops->stop;
[GD>>] Would it be better to explicitly return a bool to match the return type?


I'm not sure about the kernel code style regarding that casting. Maybe
it's better to return !!ops->stop here. The macros likely and unlikely
do that.


IIUC `ops->stop` is a function pointer, so what about

return ops->stop != NULL;

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d

2022-05-26 Thread Dan Carpenter
On Thu, May 26, 2022 at 02:16:34AM +0100, Matthew Wilcox wrote:
> Bizarre this started showing up now.  The recent patch was:
> 
> -   info->alloced += compound_nr(page);
> -   inode->i_blocks += BLOCKS_PER_PAGE << compound_order(page);
> +   info->alloced += folio_nr_pages(folio);
> +   inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio);
> 
> so it could tell that compound_order() was small, but folio_order()
> might be large?

The old code also generates a warning on my test system.  Smatch thinks
both compound_order() and folio_order() are 0-255.  I guess because of
the "unsigned char compound_order;" in the struct page.

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d

2022-05-26 Thread Arnd Bergmann
On Wed, May 25, 2022 at 11:35 PM kernel test robot  wrote:
> .__mulsi3.o.cmd: No such file or directory
> Makefile:686: arch/h8300/Makefile: No such file or directory
> Makefile:765: arch/h8300/Makefile: No such file or directory
> arch/Kconfig:10: can't open file "arch/h8300/Kconfig"

Please stop building h8300  after the asm-generic tree is merged, the
architecture is getting removed.

Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d

2022-05-26 Thread Dan Carpenter
On Wed, May 25, 2022 at 02:50:56PM -0700, Andrew Morton wrote:
> On Thu, 26 May 2022 05:35:20 +0800 kernel test robot  wrote:
> 
> > tree/branch: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> > branch HEAD: 8cb8311e95e3bb58bd84d6350365f14a718faa6d  Add linux-next 
> > specific files for 20220525
> > 
> > Error/Warning reports:
> > 
> > ...
> >
> > Unverified Error/Warning (likely false positive, please contact us if 
> > interested):
> 
> Could be so.
> 
> > mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << 
> > folio_order(folio)' be a 64 bit type?
> 
> I've been seeing this one for a while.  And from this report I can't
> figure out what tool emitted it.  Clang?

This is a Smatch warning.

I normally look over Smatch warnings before forwarding kbuild-bot emails
but this email is a grab bag of static checker warnings from different
tools.

This warning has a high rate of false positives so I'm going to disable
it by default.

> 
> >
> > ...
> >
> > |-- i386-randconfig-m021
> > |   `-- 
> > mm-shmem.c-shmem_getpage_gfp()-warn:should-((()-)-)-folio_order(folio)-be-a-bit-type
> 
> If you're going to use randconfig then shouldn't you make the config
> available?  Or maybe quote the KCONFIG_SEED - presumably there's a way
> for others to regenerate.
> 
> Anyway, the warning seems wrong to me.
> 
> 
> #define PAGE_SIZE   (_AC(1,UL) << PAGE_SHIFT)
> 
> #define BLOCKS_PER_PAGE  (PAGE_SIZE/512)
> 
>   inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio);
> 
> so the RHS here should have unsigned long type.  Being able to generate
> the cpp output would be helpful.  That requires the .config.

The heuristic is that "inode->i_blocks" is a u64 but this .config must
be for a 32bit CPU.

I'm just going to turn off all these warnings until I can figure out a
better heuristic.

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.

2022-05-26 Thread Jason Wang
On Tue, May 24, 2022 at 7:07 PM Andrew Melnichenko  wrote:
>
> Hi all,
>
> The issue is that host segments packets between guests on the same host.
> Tests show that it happens because SKB_GSO_DODGY skb offload in
> virtio_net_hdr_from_skb().
> To do segmentation you need to remove SKB_GSO_DODGY or add SKB_GSO_PARTIAL
> The solution with DODGY/PARTIAL offload looks like a dirty hack, so
> for now, I've lived it as it is for further investigation.

Ok, I managed to find the previous discussion. It looks to me the
reason is that __udp_gso_segment will segment dodgy packets
unconditionally.

I wonder if the attached patch works? (compile test only).

Thanks

>
>
> On Tue, May 17, 2022 at 9:32 AM Jason Wang  wrote:
> >
> > On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko  
> > wrote:
> > >
> > > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> > > Technically they enable NETIF_F_GSO_UDP_L4
> > > (and only if USO4 & USO6 are set simultaneously).
> > > It allows to transmission of large UDP packets.
> > >
> > > Different features USO4 and USO6 are required for qemu where Windows 
> > > guests can
> > > enable disable USO receives for IPv4 and IPv6 separately.
> > > On the other side, Linux can't really differentiate USO4 and USO6, for 
> > > now.
> > > For now, to enable USO for TUN it requires enabling USO4 and USO6 
> > > together.
> > > In the future, there would be a mechanism to control UDP_L4 GSO 
> > > separately.
> > >
> > > Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
> > >
> > > New types for VirtioNet already on mailing:
> > > https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
> > >
> > > Also, there is a known issue with transmitting packages between two 
> > > guests.
> >
> > Could you explain this more? It looks like a bug. (Or any pointer to
> > the discussion)
> >
> > Thanks
> >
> > > Without hacks with skb's GSO - packages are still segmented on the host's 
> > > postrouting.
> > >
> > > Andrew (5):
> > >   uapi/linux/if_tun.h: Added new offload types for USO4/6.
> > >   driver/net/tun: Added features for USO.
> > >   uapi/linux/virtio_net.h: Added USO types.
> > >   linux/virtio_net.h: Support USO offload in vnet header.
> > >   drivers/net/virtio_net.c: Added USO support.
> > >
> > >  drivers/net/tap.c   | 10 --
> > >  drivers/net/tun.c   |  8 +++-
> > >  drivers/net/virtio_net.c| 19 +++
> > >  include/linux/virtio_net.h  |  9 +
> > >  include/uapi/linux/if_tun.h |  2 ++
> > >  include/uapi/linux/virtio_net.h |  4 
> > >  6 files changed, 45 insertions(+), 7 deletions(-)
> > >
> > > --
> > > 2.35.1
> > >
> >
>


0001-udp-allow-header-check-for-dodgy-GSO_UDP_L4-packets.patch
Description: Binary data
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization