date:20220421

Re: [PATCH v1] vdpa/vp_vdpa : add vdpa tool support in vp_vdpa

2022-04-21 Thread Jason Wang

On Fri, Apr 22, 2022 at 1:57 PM Cindy Lu  wrote:
>
> On Wed, Apr 20, 2022 at 2:27 PM Cindy Lu  wrote:
> >
> >
> >
> > On Wed, Apr 20, 2022 at 11:21 AM Jason Wang  wrote:
> >>
> >>
> >> 在 2022/4/19 09:40, Cindy Lu 写道:
> >> > this patch is to add the support for vdpa tool in vp_vdpa
> >> > here is the example steps
> >> > modprobe vp_vdpa
> >> > modprobe vhost_vdpa
> >> >
> >> > echo :00:06.0>/sys/bus/pci/drivers/virtio-pci/unbind
> >> > echo 1af4 1041 > /sys/bus/pci/drivers/vp-vdpa/new_id
> >> >
> >> > vdpa dev add name vdpa1 mgmtdev pci/:00:06.0
> >> >
> >> > Signed-off-by: Cindy Lu 
> >> > ---
> >> >   drivers/vdpa/virtio_pci/vp_vdpa.c | 86 ---
> >> >   1 file changed, 78 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> >> > b/drivers/vdpa/virtio_pci/vp_vdpa.c
> >> > index cce101e6a940..d8a1d2f8e9bb 100644
> >> > --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> >> > +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> >> > @@ -17,6 +17,8 @@
> >> >   #include 
> >> >   #include 
> >> >   #include 
> >> > +#include 
> >> > +#include 
> >> >
> >> >   #define VP_VDPA_QUEUE_MAX 256
> >> >   #define VP_VDPA_DRIVER_NAME "vp_vdpa"
> >> > @@ -41,6 +43,18 @@ struct vp_vdpa {
> >> >   int vectors;
> >> >   };
> >> >
> >> > +struct vp_vdpa_mgmtdev {
> >> > + struct vdpa_mgmt_dev mgtdev;
> >> > + struct vp_vdpa *vp_vdpa;
> >> > + struct pci_dev *pdev;
> >> > +};
> >> > +
> >> > +#define VP_VDPA_NET_FEATURES
> >> >\
> >> > + ((1ULL << VIRTIO_F_ANY_LAYOUT) | (1ULL << VIRTIO_F_VERSION_1) |
> >> > \
> >> > +  (1ULL << VIRTIO_F_ACCESS_PLATFORM))
> >> > +
> >> > +#define VP_VDPA_NET_VQ_NUM 2
> >>
> >>
> >> Let's not go backwards, e.g we've already support any kind of virtio
> >> device with any kind of features. see the comment in vp_vdpa_probe().
> >>
> >>
> > got it, I will fix this
> >>
> >> > +
> >> >   static struct vp_vdpa *vdpa_to_vp(struct vdpa_device *vdpa)
> >> >   {
> >> >   return container_of(vdpa, struct vp_vdpa, vdpa);
> >> > @@ -454,9 +468,14 @@ static void vp_vdpa_free_irq_vectors(void *data)
> >> >   pci_free_irq_vectors(data);
> >> >   }
> >> >
> >> > -static int vp_vdpa_probe(struct pci_dev *pdev, const struct 
> >> > pci_device_id *id)
> >> > +static int vp_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char 
> >> > *name,
> >> > +const struct vdpa_dev_set_config *add_config)
> >> >   {
> >> > + struct vp_vdpa_mgmtdev *vp_vdpa_mgtdev =
> >> > + container_of(v_mdev, struct vp_vdpa_mgmtdev, mgtdev);
> >> > +
> >> >   struct virtio_pci_modern_device *mdev;
> >> > + struct pci_dev *pdev = vp_vdpa_mgtdev->pdev;
> >> >   struct device *dev = &pdev->dev;
> >> >   struct vp_vdpa *vp_vdpa;
> >> >   int ret, i;
> >> > @@ -465,8 +484,9 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const 
> >> > struct pci_device_id *id)
> >> >   if (ret)
> >> >   return ret;
> >> >
> >> > - vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa,
> >> > - dev, &vp_vdpa_ops, NULL, false);
> >> > + vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, dev, 
> >> > &vp_vdpa_ops,
> >> > + name, false);
> >>
> >>
> >> Nit: let's keep the line of vdpa_alloc_device() unchanged to reduce the
> >> changeset.
> >>
> >>
> > will fix this
> >>
> >> > +
> >> >   if (IS_ERR(vp_vdpa)) {
> >> >   dev_err(dev, "vp_vdpa: Failed to allocate vDPA 
> >> > structure\n");
> >> >   return PTR_ERR(vp_vdpa);
> >> > @@ -480,9 +500,10 @@ static int vp_vdpa_probe(struct pci_dev *pdev, 
> >> > const struct pci_device_id *id)
> >> >   dev_err(&pdev->dev, "Failed to probe modern PCI device\n");
> >> >   goto err;
> >> >   }
> >> > + vp_vdpa_mgtdev->vp_vdpa = vp_vdpa;
> >> >
> >> >   pci_set_master(pdev);
> >> > - pci_set_drvdata(pdev, vp_vdpa);
> >> > + pci_set_drvdata(pdev, vp_vdpa_mgtdev);
> >> >
> >> >   vp_vdpa->vdpa.dma_dev = &pdev->dev;
> >> >   vp_vdpa->queues = vp_modern_get_num_queues(mdev);
> >> > @@ -516,7 +537,8 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const 
> >> > struct pci_device_id *id)
> >> >   }
> >> >   vp_vdpa->config_irq = VIRTIO_MSI_NO_VECTOR;
> >> >
> >> > - ret = vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues);
> >> > + vp_vdpa->vdpa.mdev = &vp_vdpa_mgtdev->mgtdev;
> >> > + ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues);
> >> >   if (ret) {
> >> >   dev_err(&pdev->dev, "Failed to register to vdpa bus\n");
> >> >   goto err;
> >> > @@ -529,12 +551,60 @@ static int vp_vdpa_probe(struct pci_dev *pdev, 
> >> > const struct pci_device_id *id)
> >> >   return ret;
> >> >   }
> >> >
> >> > -static void vp_vdpa_remove(struct pci_dev *pdev)
> >> > +static void vp_vdpa_dev_del(struct vdpa_mgmt_dev

Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()

2022-04-21 Thread Bjorn Helgaas

On Thu, Apr 21, 2022 at 11:27:42AM +0200, Niklas Schnelle wrote:
> On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > > While determining the next PCI function is factored out of
> > > pci_scan_slot() into next_fn() the former still handles the first
> > > function as a special case duplicating the code from the scan loop and
> > > splitting the condition that the first function exits from it being
> > > multifunction which is tested in next_fn().
> > > 
> > > Furthermore the non ARI branch of next_fn() mixes the case that
> > > multifunction devices may have non-contiguous function ranges and dev
> > > may thus be NULL with the multifunction requirement. It also signals
> > > that no further functions need to be scanned by returning 0 which is
> > > a valid function number.
> > > 
> > > Improve upon this by moving all conditions for having to scan for more
> > > functions into next_fn() and make them obvious and commented.
> > > 
> > > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > > next function we can then handle the initial function inside the loop
> > > and deduplicate the shared handling.
> > > 
> > > No functional change is intended.
> > > 
> > > Signed-off-by: Niklas Schnelle 
> > > ---
> > >  drivers/pci/probe.c | 41 +++--
> > >  1 file changed, 19 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 17a969942d37..389aa1f9cb2c 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct 
> > > pci_bus *bus, int devfn)
> > >  }
> > >  EXPORT_SYMBOL(pci_scan_single_device);
> > >  
> > > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > > - unsigned int fn)
> > > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> > >  {
> > >   int pos;
> > >   u16 cap = 0;
> > >   unsigned int next_fn;
> > >  
> > > - if (pci_ari_enabled(bus)) {
> > > - if (!dev)
> > > - return 0;
> > > + if (dev && pci_ari_enabled(bus)) {
> > 
> > I think this would be easier to verify if we kept the explicit error
> > return, e.g.,
> > 
> >   if (pci_ari_enabled(bus)) {
> > if (!dev)
> >   return -ENODEV;
> > pos = pci_find_ext_capability(...);
> > 
> > Otherwise we have to sort through the !dev cases below.  I guess
> > -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> > case, but it's not obvious to me that those are equivalent to the
> > previous code.
> 
> We could keep this the same for this patch but I think for jailhouse
> (patch 2) we need the "!dev" case not to fail here such that we can
> handle the missing function 0 below even if ARI is enabled. For s390
> this doesn't currently matter because pci_ari_enabled(bus) is always
> false but I assumed that this isn't necessarily so for jailhouse. I
> sent a follow up mail on a slight behavior change I can think of for
> this case for v2 but forgot to send it also for v3. Quoted below:

I think it would be good to make the first patch change as little as
possible to make it easier to analyze, then possibly test for
hypervisor when changing this behavior.

> > > - /* dev may be NULL for non-contiguous multifunction devices */
> > > - if (!dev || dev->multifunction)
> > > - return (fn + 1) % 8;
> > > -
> > > - return 0;
> > > + /* only multifunction devices may have more functions */
> > > + if (dev && !dev->multifunction)
> > > + return -ENODEV;
> > 
> > I don't understand why the "!dev || dev->multifunction" test needs to
> > change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> > to return success in some cases that currently return failure, so this
> > case that was already success should be fine as it was.
> 
> This isn't a change to the test. It's the negation of the logical
> condition *and* a switch of the branches i.e. keeps the overall
> behavior exactly the same. The equivalence is !(!A || B) == (A && !B).

I see the Boolean equivalence, but it's difficult to verify that the
consequences are equivalent because the new code has the extra "!fn &&
!dev" test in the middle.

> There are two reasons I did this.
> 
> 1. I find (!dev || dev->multifunction) to be much harder to grasp than
> (dev && !dev->multifunction).
> 
> 2. The whole next_fn() in my opinion becomes easier to read if it bails
> for all bad cases early and the "this is the next fn" is the final
> return if we didn't bail. This becomes even more true as another
> condition is added in patch 2.

Fair enough, and I agree that "this is the next fn" is a nice final
return.  In general I think it's good to return either an error or the
next fn as soon as it is known.  It makes it harder to analyze if the
return value has already been determined but we have to mentally pass
o

Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values

2022-04-21 Thread Stefano Garzarella

On Thu, Apr 21, 2022 at 5:30 PM Andrea Parri  wrote:
>
> > > @@ -577,12 +577,19 @@ static bool hvs_dgram_allow(u32 cid, u32 port)
> > > static int hvs_update_recv_data(struct hvsock *hvs)
> > > {
> > > struct hvs_recv_buf *recv_buf;
> > > -   u32 payload_len;
> > > +   u32 pkt_len, payload_len;
> > > +
> > > +   pkt_len = hv_pkt_len(hvs->recv_desc);
> > > +
> > > +   /* Ensure the packet is big enough to read its header */
> > > +   if (pkt_len < HVS_HEADER_LEN)
> > > +   return -EIO;
> > >
> > > recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
> > > payload_len = recv_buf->hdr.data_size;
> > >
> > > -   if (payload_len > HVS_MTU_SIZE)
> > > +   /* Ensure the packet is big enough to read its payload */
> > > +   if (payload_len > pkt_len - HVS_HEADER_LEN || payload_len > 
> > > HVS_MTU_SIZE)
> >
> > checkpatch warns that we exceed 80 characters, I do not have a strong
> > opinion on this, but if you have to resend better break the condition into 2
> > lines.
>
> Will break if preferred.  (but does it really warn??  I understand that
> the warning was deprecated and the "limit" increased to 100 chars...)

I see the warn here:
https://patchwork.kernel.org/project/netdevbpf/patch/20220420200720.434717-4-parri.and...@gmail.com/

in the kernel doc [1] we still say we prefer 80 columns, so I try to
follow, especially when it doesn't make things worse.

[1] 
https://docs.kernel.org/process/coding-style.html#breaking-long-lines-and-strings

>
>
> > Maybe even update or remove the comment? (it only describes the first
> > condition, but the conditions are pretty clear, so I don't think it adds
> > much).
>
> Works for me.  (taking it as this applies to the previous comment too.)

Yep.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO

2022-04-21 Thread Willem de Bruijn

On Wed, Apr 20, 2022 at 10:32 PM Hangbin Liu  wrote:
>
> On Wed, Apr 20, 2022 at 09:45:15AM -0400, Willem de Bruijn wrote:
> > On Wed, Apr 20, 2022 at 4:28 AM Hangbin Liu  wrote:
> > >
> > > Currently, the kernel drops GSO VLAN tagged packet if it's created with
> > > socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr.
> > >
> > > The reason is AF_PACKET doesn't adjust the skb network header if there is
> > > a VLAN tag. Then after virtio_net_hdr_set_proto() called, the 
> > > skb->protocol
> > > will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb
> > > is dropped as network header position is invalid.
> > >
> > > Let's handle VLAN packets by adjusting network header position in
> > > packet_parse_headers(), and move the function in packet_snd() before
> > > calling virtio_net_hdr_set_proto().
> >
> > The network header is set in
> >
> > skb_reset_network_header(skb);
> >
> > err = -EINVAL;
> > if (sock->type == SOCK_DGRAM) {
> > offset = dev_hard_header(skb, dev, ntohs(proto), addr,
> > NULL, len);
> > if (unlikely(offset < 0))
> > goto out_free;
> > } else if (reserve) {
> > skb_reserve(skb, -reserve);
> > if (len < reserve + sizeof(struct ipv6hdr) &&
> > dev->min_header_len != dev->hard_header_len)
> > skb_reset_network_header(skb);
> > }
> >
> > If all that is needed is to move the network header beyond an optional
> > VLAN tag in the case of SOCK_RAW, then this can be done in the else
> > for Ethernet packets.
>
> Before we set network position, we need to check the skb->protocol to make
> sure it's a VLAN packet.
>
> If we set skb->protocol and adjust network header here, like
> packet_parse_headers() does. How should we do with later
>
> skb->protocol = proto;
> skb->dev = dev;
>
> settings?
>
> If you are afraid that skb_probe_transport_header(skb) would affect the
> virtio_net_hdr operation. How about split the skb_probe_transport_header()
> from packet_parse_headers()? Something like
>
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1924,13 +1924,19 @@ static int packet_rcv_spkt(struct sk_buff *skb, 
> struct net_device *dev,
>
>  static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
>  {
> +   int depth;
> +
> if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
> sock->type == SOCK_RAW) {
> skb_reset_mac_header(skb);
> skb->protocol = dev_parse_header_protocol(skb);
> }
>
> -   skb_probe_transport_header(skb);
> +   /* Move network header to the right position for VLAN tagged packets 
> */
> +   if (likely(skb->dev->type == ARPHRD_ETHER) &&
> +   eth_type_vlan(skb->protocol) &&
> +   __vlan_get_protocol(skb, skb->protocol, &depth) != 0)
> +   skb_set_network_header(skb, depth);
>  }
>
>  /*
> @@ -3047,6 +3055,8 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
> skb->mark = sockc.mark;
> skb->tstamp = sockc.transmit_time;
>
> +   packet_parse_headers(skb, sock);
> +
> if (has_vnet_hdr) {
> err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le());
> if (err)
> @@ -3055,7 +3065,7 @@ static int packet_snd(struct socket *sock, struct 
> msghdr *msg, size_t len)
> virtio_net_hdr_set_proto(skb, &vnet_hdr);
> }
>
> -   packet_parse_headers(skb, sock);
> +   skb_probe_transport_header(skb);
>
> if (unlikely(extra_len == 4))
> skb->no_fcs = 1;
>
>
> >
> > It is not safe to increase reserve, as that would eat into the
> > reserved hlen LL_RESERVED_SPACE(dev), which does not account for
> > optional VLAN headers.
> >
> > Instead of setting here first, then patching up again later in
> > packet_parse_headers.
> >
> > This change affects all packets with VLAN headers, not just those with
> > GSO. I imagine that moving the network header is safe for all, but
> > don't know that code well enough to verify that it does not have
> > unintended side effects. Does dev_queue_xmit expect the network header
> > to point to the start of the VLAN headers or after, for instance?
>
> I think adjust the network position should be safe, as tap device also did 
> that.

Oh, it's great to be able to point to such prior art. Can you mention
that in the commit message?

Your approach does sound simpler than the above. Thanks for looking
into that alternative, though.

>
> Thanks
> Hangbin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values

2022-04-21 Thread Stefano Garzarella


On Wed, Apr 20, 2022 at 10:07:18PM +0200, Andrea Parri (Microsoft) wrote:

For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer.  Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().

Signed-off-by: Andrea Parri (Microsoft) 
---
include/linux/hyperv.h   |  5 +
net/vmw_vsock/hyperv_transport.c | 11 +--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index fe2e0179ed51e..55478a6810b60 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1663,6 +1663,11 @@ static inline u32 hv_pkt_datalen(const struct 
vmpacket_descriptor *desc)
return (desc->len8 << 3) - (desc->offset8 << 3);
}

+/* Get packet length associated with descriptor */
+static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc)
+{
+   return desc->len8 << 3;
+}

struct vmpacket_descriptor *
hv_pkt_iter_first_raw(struct vmbus_channel *channel);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 8c37d07017fc4..092cadc2c866d 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -577,12 +577,19 @@ static bool hvs_dgram_allow(u32 cid, u32 port)
static int hvs_update_recv_data(struct hvsock *hvs)
{
struct hvs_recv_buf *recv_buf;
-   u32 payload_len;
+   u32 pkt_len, payload_len;
+
+   pkt_len = hv_pkt_len(hvs->recv_desc);
+
+   /* Ensure the packet is big enough to read its header */
+   if (pkt_len < HVS_HEADER_LEN)
+   return -EIO;

recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
payload_len = recv_buf->hdr.data_size;

-   if (payload_len > HVS_MTU_SIZE)
+   /* Ensure the packet is big enough to read its payload */
+   if (payload_len > pkt_len - HVS_HEADER_LEN || payload_len > 
HVS_MTU_SIZE)


checkpatch warns that we exceed 80 characters, I do not have a strong 
opinion on this, but if you have to resend better break the condition 
into 2 lines.


Maybe even update or remove the comment? (it only describes the first 
condition, but the conditions are pretty clear, so I don't think it adds 
much).


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/5] hv_sock: Copy packets sent by Hyper-V out of the ring buffer

2022-04-21 Thread Stefano Garzarella


On Wed, Apr 20, 2022 at 10:07:17PM +0200, Andrea Parri (Microsoft) wrote:

Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM.  Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest.  To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}().  In this way,
the packet can no longer be modified by the host.

Signed-off-by: Andrea Parri (Microsoft) 
---
net/vmw_vsock/hyperv_transport.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 943352530936e..8c37d07017fc4 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -78,6 +78,9 @@ struct hvs_send_buf {
 ALIGN((payload_len), 8) + \
 VMBUS_PKT_TRAILER_SIZE)

+/* Upper bound on the size of a VMbus packet for hv_sock */
+#define HVS_MAX_PKT_SIZE   HVS_PKT_LEN(HVS_MTU_SIZE)
+
union hvs_service_id {
guid_t  srv_id;

@@ -378,6 +381,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
}

+   chan->max_pkt_size = HVS_MAX_PKT_SIZE;
+


premise, I don't know HyperV channels :-(

Is this change necessary to use hv_pkt_iter_first() instead of 
hv_pkt_iter_first_raw()?


If yes, then please mention that you set this value in the commit 
message, otherwise maybe better to have a separate patch.


Thanks,
Stefano


ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
 conn_from_host ? new : sk);
if (ret != 0) {
@@ -602,7 +607,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, 
struct msghdr *msg,
return -EOPNOTSUPP;

if (need_refill) {
-   hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+   hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
if (!hvs->recv_desc)
return -ENOBUFS;
ret = hvs_update_recv_data(hvs);
@@ -618,7 +623,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, 
struct msghdr *msg,

hvs->recv_data_len -= to_read;
if (hvs->recv_data_len == 0) {
-   hvs->recv_desc = hv_pkt_iter_next_raw(hvs->chan, 
hvs->recv_desc);
+   hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
if (hvs->recv_desc) {
ret = hvs_update_recv_data(hvs);
if (ret)
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/5] hv_sock: Check hv_pkt_iter_first_raw()'s return value

2022-04-21 Thread Stefano Garzarella


On Wed, Apr 20, 2022 at 10:07:16PM +0200, Andrea Parri (Microsoft) wrote:

The function returns NULL if the ring buffer doesn't contain enough
readable bytes to constitute a packet descriptor.  The ring buffer's
write_index is in memory which is shared with the Hyper-V host, an
erroneous or malicious host could thus change its value and overturn
the result of hvs_stream_has_data().

Signed-off-by: Andrea Parri (Microsoft) 
---
net/vmw_vsock/hyperv_transport.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b66604..943352530936e 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -603,6 +603,8 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, 
struct msghdr *msg,

if (need_refill) {
hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+   if (!hvs->recv_desc)
+   return -ENOBUFS;
ret = hvs_update_recv_data(hvs);
if (ret)
return ret;
--
2.25.1



Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback

2022-04-21 Thread Gonglei (Arei) via Virtualization




> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Thursday, April 21, 2022 6:40 PM
> To: Gonglei (Arei) ; m...@redhat.com
> Cc: jasow...@redhat.com; herb...@gondor.apana.org.au;
> linux-ker...@vger.kernel.org; virtualization@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; helei.si...@bytedance.com;
> da...@davemloft.net; zhenwei pi 
> Subject: [PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback
> 
> From: lei he 
> 
> For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length of
> returned result maybe less than akcipher_req->dst_len, we need to recalculate
> the actual dst_len through the virt-queue protocol.
> 
OK ...

> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: Gonglei 
> Signed-off-by: lei he 
> Signed-off-by: zhenwei pi 
> ---
>  drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> index 9561bc2df62b..82db86e088c2 100644
> --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> @@ -90,9 +90,12 @@ static void
> virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request *
>   }
> 
>   akcipher_req = vc_akcipher_req->akcipher_req;
> - if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY)
> + if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) {
> + /* actuall length maybe less than dst buffer */
> + akcipher_req->dst_len = len - sizeof(vc_req->status);

...but why minus sizeof(vc_req->status)?


Regards,
-Gonglei


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 5/5] virtio-crypto: enable retry for virtio-crypto-dev

2022-04-21 Thread zhenwei pi

From: lei he 

Enable retry for virtio-crypto-dev, so that crypto-engine
can process cipher-requests parallelly.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: lei he 
Signed-off-by: zhenwei pi 
---
 drivers/crypto/virtio/virtio_crypto_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
b/drivers/crypto/virtio/virtio_crypto_core.c
index d8edefcb966c..5c0d68c9e894 100644
--- a/drivers/crypto/virtio/virtio_crypto_core.c
+++ b/drivers/crypto/virtio/virtio_crypto_core.c
@@ -62,7 +62,8 @@ static int virtcrypto_find_vqs(struct virtio_crypto *vi)
spin_lock_init(&vi->data_vq[i].lock);
vi->data_vq[i].vq = vqs[i];
/* Initialize crypto engine */
-   vi->data_vq[i].engine = crypto_engine_alloc_init(dev, 1);
+   vi->data_vq[i].engine = crypto_engine_alloc_init_and_set(dev, 
true, NULL, 1,
+   
virtqueue_get_vring_size(vqs[i]));
if (!vi->data_vq[i].engine) {
ret = -ENOMEM;
goto err_engine;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback

2022-04-21 Thread zhenwei pi

From: lei he 

For some akcipher operations(eg, decryption of pkcs1pad(rsa)),
the length of returned result maybe less than akcipher_req->dst_len,
we need to recalculate the actual dst_len through the virt-queue
protocol.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: lei he 
Signed-off-by: zhenwei pi 
---
 drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c 
b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
index 9561bc2df62b..82db86e088c2 100644
--- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
@@ -90,9 +90,12 @@ static void virtio_crypto_dataq_akcipher_callback(struct 
virtio_crypto_request *
}
 
akcipher_req = vc_akcipher_req->akcipher_req;
-   if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY)
+   if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) {
+   /* actuall length maybe less than dst buffer */
+   akcipher_req->dst_len = len - sizeof(vc_req->status);
sg_copy_from_buffer(akcipher_req->dst, 
sg_nents(akcipher_req->dst),
vc_akcipher_req->dst_buf, 
akcipher_req->dst_len);
+   }
virtio_crypto_akcipher_finalize_req(vc_akcipher_req, akcipher_req, 
error);
 }
 
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 3/5] virtio-crypto: move helpers into virtio_crypto_common.c

2022-04-21 Thread zhenwei pi

Move virtcrypto_clear_request and virtcrypto_dataq_callback into
virtio_crypto_common.c to make code clear. Then the xx_core.c
supports:
  - probe/remove/irq affinity seting for a virtio device
  - basic virtio related operations

xx_common.c supports:
  - common helpers/functions for algos

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: zhenwei pi 
---
 drivers/crypto/virtio/virtio_crypto_common.c | 31 +++
 drivers/crypto/virtio/virtio_crypto_common.h |  2 ++
 drivers/crypto/virtio/virtio_crypto_core.c   | 32 
 3 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_common.c 
b/drivers/crypto/virtio/virtio_crypto_common.c
index 93df73c40dd3..4a23524896fe 100644
--- a/drivers/crypto/virtio/virtio_crypto_common.c
+++ b/drivers/crypto/virtio/virtio_crypto_common.c
@@ -8,6 +8,14 @@
 
 #include "virtio_crypto_common.h"
 
+void virtcrypto_clear_request(struct virtio_crypto_request *vc_req)
+{
+   if (vc_req) {
+   kfree_sensitive(vc_req->req_data);
+   kfree(vc_req->sgs);
+   }
+}
+
 static void virtio_crypto_ctrlq_callback(struct virtio_crypto_ctrl_request 
*vc_ctrl_req)
 {
complete(&vc_ctrl_req->compl);
@@ -59,3 +67,26 @@ void virtcrypto_ctrlq_callback(struct virtqueue *vq)
} while (!virtqueue_enable_cb(vq));
spin_unlock_irqrestore(&vcrypto->ctrl_lock, flags);
 }
+
+void virtcrypto_dataq_callback(struct virtqueue *vq)
+{
+   struct virtio_crypto *vcrypto = vq->vdev->priv;
+   struct virtio_crypto_request *vc_req;
+   unsigned long flags;
+   unsigned int len;
+   unsigned int qid = vq->index;
+
+   spin_lock_irqsave(&vcrypto->data_vq[qid].lock, flags);
+   do {
+   virtqueue_disable_cb(vq);
+   while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
+   spin_unlock_irqrestore(
+   &vcrypto->data_vq[qid].lock, flags);
+   if (vc_req->alg_cb)
+   vc_req->alg_cb(vc_req, len);
+   spin_lock_irqsave(
+   &vcrypto->data_vq[qid].lock, flags);
+   }
+   } while (!virtqueue_enable_cb(vq));
+   spin_unlock_irqrestore(&vcrypto->data_vq[qid].lock, flags);
+}
diff --git a/drivers/crypto/virtio/virtio_crypto_common.h 
b/drivers/crypto/virtio/virtio_crypto_common.h
index 25b4f22e8605..4d33ed5593d4 100644
--- a/drivers/crypto/virtio/virtio_crypto_common.h
+++ b/drivers/crypto/virtio/virtio_crypto_common.h
@@ -152,4 +152,6 @@ int virtio_crypto_ctrl_vq_request(struct virtio_crypto 
*vcrypto, struct scatterl
  unsigned int out_sgs, unsigned int in_sgs,
  struct virtio_crypto_ctrl_request 
*vc_ctrl_req);
 
+void virtcrypto_dataq_callback(struct virtqueue *vq);
+
 #endif /* _VIRTIO_CRYPTO_COMMON_H */
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
b/drivers/crypto/virtio/virtio_crypto_core.c
index e668d4b1bc6a..d8edefcb966c 100644
--- a/drivers/crypto/virtio/virtio_crypto_core.c
+++ b/drivers/crypto/virtio/virtio_crypto_core.c
@@ -13,38 +13,6 @@
 #include "virtio_crypto_common.h"
 
 
-void
-virtcrypto_clear_request(struct virtio_crypto_request *vc_req)
-{
-   if (vc_req) {
-   kfree_sensitive(vc_req->req_data);
-   kfree(vc_req->sgs);
-   }
-}
-
-static void virtcrypto_dataq_callback(struct virtqueue *vq)
-{
-   struct virtio_crypto *vcrypto = vq->vdev->priv;
-   struct virtio_crypto_request *vc_req;
-   unsigned long flags;
-   unsigned int len;
-   unsigned int qid = vq->index;
-
-   spin_lock_irqsave(&vcrypto->data_vq[qid].lock, flags);
-   do {
-   virtqueue_disable_cb(vq);
-   while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
-   spin_unlock_irqrestore(
-   &vcrypto->data_vq[qid].lock, flags);
-   if (vc_req->alg_cb)
-   vc_req->alg_cb(vc_req, len);
-   spin_lock_irqsave(
-   &vcrypto->data_vq[qid].lock, flags);
-   }
-   } while (!virtqueue_enable_cb(vq));
-   spin_unlock_irqrestore(&vcrypto->data_vq[qid].lock, flags);
-}
-
 static int virtcrypto_find_vqs(struct virtio_crypto *vi)
 {
vq_callback_t **callbacks;
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 2/5] virtio-crypto: wait ctrl queue instead of busy polling

2022-04-21 Thread zhenwei pi

Originally, after submitting request into virtio crypto control
queue, the guest side polls the result from the virt queue. This
works like following:
CPU0   CPU1   ... CPUx  CPUy
 |  |  | |
 \  \  / /
  \spin_lock(&vcrypto->ctrl_lock)---/
   |
 virtqueue add & kick
   |
  busy poll virtqueue
   |
  spin_unlock(&vcrypto->ctrl_lock)
  ...

There are two problems:
1, The queue depth is always 1, the performance of a virtio crypto
   device gets limited. Multi user processes share a single control
   queue, and hit spin lock race from control queue. Test on Intel
   Platinum 8260, a single worker gets ~35K/s create/close session
   operations, and 8 workers get ~40K/s operations with 800% CPU
   utilization.
2, The control request is supposed to get handled immediately, but
   in the current implementation of QEMU(v6.2), the vCPU thread kicks
   another thread to do this work, the latency also gets unstable.
   Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
usecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 7||
 4 -> 7  : 72   ||
 8 -> 15 : 186485   ||
16 -> 31 : 687  ||
32 -> 63 : 5||
64 -> 127: 3||
   128 -> 255: 1||
   256 -> 511: 0||
   512 -> 1023   : 0||
  1024 -> 2047   : 0||
  2048 -> 4095   : 0||
  4096 -> 8191   : 0||
  8192 -> 16383  : 2||
   This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.

To improve the performance of control queue, a request on control queue waits
completion instead of busy polling to reduce lock racing, and gets completed by
control queue callback.
CPU0   CPU1   ... CPUx  CPUy
 |  |  | |
 \  \  / /
  \spin_lock(&vcrypto->ctrl_lock)---/
   |
 virtqueue add & kick
   |
  -spin_unlock(&vcrypto->ctrl_lock)--
 /  /  \ \
 |  |  | |
wait   wait   wait  wait

Test this patch, the guest side get ~200K/s operations with 300% CPU
utilization.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: zhenwei pi 
---
 drivers/crypto/virtio/virtio_crypto_common.c | 42 +++-
 drivers/crypto/virtio/virtio_crypto_common.h |  8 
 drivers/crypto/virtio/virtio_crypto_core.c   |  2 +-
 3 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_common.c 
b/drivers/crypto/virtio/virtio_crypto_common.c
index e65125a74db2..93df73c40dd3 100644
--- a/drivers/crypto/virtio/virtio_crypto_common.c
+++ b/drivers/crypto/virtio/virtio_crypto_common.c
@@ -8,14 +8,21 @@
 
 #include "virtio_crypto_common.h"
 
+static void virtio_crypto_ctrlq_callback(struct virtio_crypto_ctrl_request 
*vc_ctrl_req)
+{
+   complete(&vc_ctrl_req->compl);
+}
+
 int virtio_crypto_ctrl_vq_request(struct virtio_crypto *vcrypto, struct 
scatterlist *sgs[],
  unsigned int out_sgs, unsigned int in_sgs,
  struct virtio_crypto_ctrl_request 
*vc_ctrl_req)
 {
int err;
-   unsigned int inlen;
unsigned long flags;
 
+   init_completion(&vc_ctrl_req->compl);
+   vc_ctrl_req->ctrl_cb =  virtio_crypto_ctrlq_callback;
+
spin_lock_irqsave(&vcrypto->ctrl_lock, flags);
err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, out_sgs, in_sgs, 
vc_ctrl_req, GFP_ATOMIC);
if (err < 0) {
@@ -24,16 +31,31 @@ int virtio_crypto_ctrl_vq_request(struct virtio_crypto 
*vcrypto, struct scatterl
}
 
virtqueue_kick(vcrypto->ctrl_vq);
-
-   /*
-* Trapping into the hypervisor, so the request should be
-* handled immediately.
-*/
-   while (!virtqueue_get_buf(vcrypto->ctrl_vq, &inlen) &&
-   !virtqueue_is_broken(vcrypto->ctrl_vq))
-   cpu_relax();
-
spin_unlock_irqrestore(&vcrypto->ctrl_lock, flags);
 
+   wait_for_completion(&vc_ctrl_req->comp

[PATCH v3 1/5] virtio-crypto: use private buffer for control request

2022-04-21 Thread zhenwei pi

Originally, all of the control requests share a single buffer(
ctrl & input & ctrl_status fields in struct virtio_crypto), this
allows queue depth 1 only, the performance of control queue gets
limited by this design.

In this patch, each request allocates request buffer dynamically, and
free buffer after request, it's possible to optimize control queue
depth in the next step.

A necessary comment is already in code, still describe it again:
/*
 * Note: there are padding fields in request, clear them to zero before
 * sending to host,
 * Ex, virtio_crypto_ctrl_request::ctrl::u::destroy_session::padding[48]
 */
So use kzalloc to allocate buffer of struct virtio_crypto_ctrl_request.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: zhenwei pi 
---
 drivers/crypto/virtio/Makefile|   1 +
 .../virtio/virtio_crypto_akcipher_algs.c  |  90 ++--
 drivers/crypto/virtio/virtio_crypto_common.c  |  39 +
 drivers/crypto/virtio/virtio_crypto_common.h  |  19 ++-
 .../virtio/virtio_crypto_skcipher_algs.c  | 133 --
 5 files changed, 156 insertions(+), 126 deletions(-)
 create mode 100644 drivers/crypto/virtio/virtio_crypto_common.c

diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
index bfa6cbae342e..49c1fa80e465 100644
--- a/drivers/crypto/virtio/Makefile
+++ b/drivers/crypto/virtio/Makefile
@@ -3,5 +3,6 @@ obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio_crypto.o
 virtio_crypto-objs := \
virtio_crypto_skcipher_algs.o \
virtio_crypto_akcipher_algs.o \
+   virtio_crypto_common.o \
virtio_crypto_mgr.o \
virtio_crypto_core.o
diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c 
b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
index f3ec9420215e..9561bc2df62b 100644
--- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
@@ -102,8 +102,8 @@ static int virtio_crypto_alg_akcipher_init_session(struct 
virtio_crypto_akcipher
 {
struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
struct virtio_crypto *vcrypto = ctx->vcrypto;
+   struct virtio_crypto_ctrl_request *vc_ctrl_req;
uint8_t *pkey;
-   unsigned int inlen;
int err;
unsigned int num_out = 0, num_in = 0;
 
@@ -111,98 +111,91 @@ static int virtio_crypto_alg_akcipher_init_session(struct 
virtio_crypto_akcipher
if (!pkey)
return -ENOMEM;
 
-   spin_lock(&vcrypto->ctrl_lock);
-   memcpy(&vcrypto->ctrl.header, header, sizeof(vcrypto->ctrl.header));
-   memcpy(&vcrypto->ctrl.u, para, sizeof(vcrypto->ctrl.u));
-   vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
+   vc_ctrl_req = kzalloc(sizeof(*vc_ctrl_req), GFP_KERNEL);
+   if (!vc_ctrl_req) {
+   err = -ENOMEM;
+   goto out;
+   }
 
-   sg_init_one(&outhdr_sg, &vcrypto->ctrl, sizeof(vcrypto->ctrl));
+   memcpy(&vc_ctrl_req->ctrl.header, header, 
sizeof(vc_ctrl_req->ctrl.header));
+   memcpy(&vc_ctrl_req->ctrl.u, para, sizeof(vc_ctrl_req->ctrl.u));
+   sg_init_one(&outhdr_sg, &vc_ctrl_req->ctrl, sizeof(vc_ctrl_req->ctrl));
sgs[num_out++] = &outhdr_sg;
 
sg_init_one(&key_sg, pkey, keylen);
sgs[num_out++] = &key_sg;
 
-   sg_init_one(&inhdr_sg, &vcrypto->input, sizeof(vcrypto->input));
+   vc_ctrl_req->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
+   sg_init_one(&inhdr_sg, &vc_ctrl_req->input, sizeof(vc_ctrl_req->input));
sgs[num_out + num_in++] = &inhdr_sg;
 
-   err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out, num_in, 
vcrypto, GFP_ATOMIC);
+   err = virtio_crypto_ctrl_vq_request(vcrypto, sgs, num_out, num_in, 
vc_ctrl_req);
if (err < 0)
goto out;
 
-   virtqueue_kick(vcrypto->ctrl_vq);
-   while (!virtqueue_get_buf(vcrypto->ctrl_vq, &inlen) &&
-  !virtqueue_is_broken(vcrypto->ctrl_vq))
-   cpu_relax();
-
-   if (le32_to_cpu(vcrypto->input.status) != VIRTIO_CRYPTO_OK) {
+   if (le32_to_cpu(vc_ctrl_req->input.status) != VIRTIO_CRYPTO_OK) {
+   pr_err("virtio_crypto: Create session failed status: %u\n",
+   le32_to_cpu(vc_ctrl_req->input.status));
err = -EINVAL;
goto out;
}
 
-   ctx->session_id = le64_to_cpu(vcrypto->input.session_id);
+   ctx->session_id = le64_to_cpu(vc_ctrl_req->input.session_id);
ctx->session_valid = true;
err = 0;
 
 out:
-   spin_unlock(&vcrypto->ctrl_lock);
+   kfree(vc_ctrl_req);
kfree_sensitive(pkey);
 
-   if (err < 0)
-   pr_err("virtio_crypto: Create session failed status: %u\n",
-   le32_to_cpu(vcrypto->input.status));
-
return err;
 }
 
 static int virtio_crypto_alg_akcipher_close_session(struct 
virtio_crypto_akcipher_ctx *ctx)
 {
struct scatterlist outhdr_sg, inhdr_s

[PATCH v3 0/5] virtio-crypto: Improve performance

2022-04-21 Thread zhenwei pi

v2 -> v3:
 - Jason suggested that spliting the first patch into two part:
 1, using private buffer
 2, remove the busy polling
   Rework as Jason's suggestion, this makes the smaller change in
   each one and clear.

v1 -> v2:
 - Use kfree instead of kfree_sensitive for insensitive buffer.
 - Several coding style fix.
 - Use memory from current node, instead of memory close to device
 - Add more message in commit, also explain why removing per-device
   request buffer.
 - Add necessary comment in code to explain why using kzalloc to
   allocate struct virtio_crypto_ctrl_request.

v1:
The main point of this series is to improve the performance for
virtio crypto:
- Use wait mechanism instead of busy polling for ctrl queue, this
  reduces CPU and lock racing, it's possiable to create/destroy session
  parallelly, QPS increases from ~40K/s to ~200K/s.
- Enable retry on crypto engine to improve performance for data queue,
  this allows the larger depth instead of 1.
- Fix dst data length in akcipher service.
- Other style fix.

lei he (2):
  virtio-crypto: adjust dst_len at ops callback
  virtio-crypto: enable retry for virtio-crypto-dev

zhenwei pi (3):
  virtio-crypto: use private buffer for control request
  virtio-crypto: wait ctrl queue instead of busy polling
  virtio-crypto: move helpers into virtio_crypto_common.c

 drivers/crypto/virtio/Makefile|   1 +
 .../virtio/virtio_crypto_akcipher_algs.c  |  95 ++---
 drivers/crypto/virtio/virtio_crypto_common.c  |  92 
 drivers/crypto/virtio/virtio_crypto_common.h  |  29 +++-
 drivers/crypto/virtio/virtio_crypto_core.c|  37 +
 .../virtio/virtio_crypto_skcipher_algs.c  | 133 --
 6 files changed, 226 insertions(+), 161 deletions(-)
 create mode 100644 drivers/crypto/virtio/virtio_crypto_common.c

-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare

2022-04-21 Thread Will Deacon

On Thu, Apr 21, 2022 at 09:44:28AM +0200, Juergen Gross wrote:
> On 20.04.22 22:44, Elliot Berman wrote:
> > From: Prakruthi Deepak Heragu 
> > 
> > During hotplug, the stolen time data structure is unmapped and memset.
> > There is a possibility of the timer IRQ being triggered before memset
> > and stolen time is getting updated as part of this timer IRQ handler. This
> > causes the below crash in timer handler -
> > 
> >[ 3457.473139][C5] Unable to handle kernel paging request at virtual 
> > address ffc03df05148
> >...
> >[ 3458.154398][C5] Call trace:
> >[ 3458.157648][C5]  para_steal_clock+0x30/0x50
> >[ 3458.162319][C5]  irqtime_account_process_tick+0x30/0x194
> >[ 3458.168148][C5]  account_process_tick+0x3c/0x280
> >[ 3458.173274][C5]  update_process_times+0x5c/0xf4
> >[ 3458.178311][C5]  tick_sched_timer+0x180/0x384
> >[ 3458.183164][C5]  __run_hrtimer+0x160/0x57c
> >[ 3458.187744][C5]  hrtimer_interrupt+0x258/0x684
> >[ 3458.192698][C5]  arch_timer_handler_virt+0x5c/0xa0
> >[ 3458.198002][C5]  handle_percpu_devid_irq+0xdc/0x414
> >[ 3458.203385][C5]  handle_domain_irq+0xa8/0x168
> >[ 3458.208241][C5]  gic_handle_irq.34493+0x54/0x244
> >[ 3458.213359][C5]  call_on_irq_stack+0x40/0x70
> >[ 3458.218125][C5]  do_interrupt_handler+0x60/0x9c
> >[ 3458.223156][C5]  el1_interrupt+0x34/0x64
> >[ 3458.227560][C5]  el1h_64_irq_handler+0x1c/0x2c
> >[ 3458.232503][C5]  el1h_64_irq+0x7c/0x80
> >[ 3458.236736][C5]  free_vmap_area_noflush+0x108/0x39c
> >[ 3458.242126][C5]  remove_vm_area+0xbc/0x118
> >[ 3458.246714][C5]  vm_remove_mappings+0x48/0x2a4
> >[ 3458.251656][C5]  __vunmap+0x154/0x278
> >[ 3458.255796][C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
> >[ 3458.261542][C5]  cpuhp_invoke_callback+0x248/0xc34
> >[ 3458.266842][C5]  cpuhp_thread_fun+0x1c4/0x248
> >[ 3458.271696][C5]  smpboot_thread_fn+0x1b0/0x400
> >[ 3458.276638][C5]  kthread+0x17c/0x1e0
> >[ 3458.280691][C5]  ret_from_fork+0x10/0x20
> > 
> > As a fix, disable the IRQs during hotplug until we unmap and memset the
> > stolen time structure.
> 
> This will work for the call chain of your observed crash, but are
> you sure that para_steal_clock() can't be called from another cpu
> concurrently?

Agreed, this needs checking as update_rq_clock() is called from all over the
place.

> In case you verified this can't happen, you can add my:
> 
> Reviewed-by: Juergen Gross 
> 
> Otherwise you either need to use RCU for doing the memunmap(), or a
> lock to protect stolen_time_region.

Yes, I think RCU would make a lot of sense here, deferring the memunmap
until there are no longer any readers. The reader is currently doing:

if (!reg->kaddr)
return 0;

return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));

so we'd also want an rcu_dereference() on reg->kaddr to avoid reading it
twice.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: PING: [PATCH v4 0/8] Introduce akcipher service for virtio-crypto

2022-04-21 Thread Daniel P . Berrangé

On Thu, Apr 21, 2022 at 09:41:40AM +0800, zhenwei pi wrote:
> Hi Daniel,
> Could you please review this series?

Yes, its on my to do. I've been on holiday for 2 weeks, so still catching
up on the backlog of reviews.

> On 4/11/22 18:43, zhenwei pi wrote:
> > v3 -> v4:
> > - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA,
> > XXX-alg -> XXX-algo.
> > - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> > - Remove ecdsa from qapi/crypto.json, it would be introduced with the 
> > implemetion later.
> > - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in 
> > qapi/crypto.json.
> > - Rename arguments of qcrypto_akcipher_XXX to keep aligned with 
> > qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add 
> > qcrypto_akcipher_max_XXX APIs.
> > - Add new API: qcrypto_akcipher_supports.
> > - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions 
> > return the actual length of result.
> > - Separate ASN.1 source code and test case clean.
> > - Disable RSA raw encoding for akcipher-nettle.
> > - Separate RSA key parser into rsakey.{hc}, and implememts it with 
> > builtin-asn1-decoder and nettle respectivly.
> > - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has 
> > higher priority than nettle.
> > - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length 
> > of returned result maybe less than the dst buffer size, return the actual 
> > length of result instead of the buffer length to the guest side. (in 
> > function virtio_crypto_akcipher_input_data_helper)
> > - Other minor changes.
> > 
> > Thanks to Daniel!
> > 
> > Eric pointed out this missing part of use case, send it here again.
> > 
> > In our plan, the feature is designed for HTTPS offloading case and other 
> > applications which use kernel RSA/ecdsa by keyctl syscall. The full picture 
> > shows bellow:
> > 
> > 
> >Nginx/openssl[1] ... Apps
> > Guest   -
> > virtio-crypto driver[2]
> > -
> > virtio-crypto backend[3]
> > Host-
> >/  |  \
> >builtin[4]   vhost keyctl[5] ...
> > 
> > 
> > [1] User applications can offload RSA calculation to kernel by keyctl 
> > syscall. There is no keyctl engine in openssl currently, we developed a 
> > engine and tried to contribute it to openssl upstream, but openssl 1.x does 
> > not accept new feature. Link:
> >  https://github.com/openssl/openssl/pull/16689
> > 
> > This branch is available and maintained by Lei 
> >  https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> > 
> > We tested nginx(change config file only) with openssl keyctl engine, it 
> > works fine.
> > 
> > [2] virtio-crypto driver is used to communicate with host side, send 
> > requests to host side to do asymmetric calculation.
> >  https://lkml.org/lkml/2022/3/1/1425
> > 
> > [3] virtio-crypto backend handles requests from guest side, and forwards 
> > request to crypto backend driver of QEMU.
> > 
> > [4] Currently RSA is supported only in builtin driver. This driver is 
> > supposed to test the full feature without other software(Ex vhost process) 
> > and hardware dependence. ecdsa is introduced into qapi type without 
> > implementation, this may be implemented in Q3-2022 or later. If ecdsa type 
> > definition should be added with the implementation together, I'll remove 
> > this in next version.
> > 
> > [5] keyctl backend is in development, we will post this feature in Q2-2022. 
> > keyctl backend can use hardware acceleration(Ex, Intel QAT).
> > 
> > Setup the full environment, tested with Intel QAT on host side, the QPS of 
> > HTTPS increase to ~200% in a guest.
> > 
> > VS PCI passthrough: the most important benefit of this solution makes the 
> > VM migratable.
> > 
> > v2 -> v3:
> > - Introduce akcipher types to qapi
> > - Add test/benchmark suite for akcipher class
> > - Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
> >- crypto: Introduce akcipher crypto class
> >- virtio-crypto: Introduce RSA algorithm
> > 
> > v1 -> v2:
> > - Update virtio_crypto.h from v2 version of related kernel patch.
> > 
> > v1:
> > - Support akcipher for virtio-crypto.
> > - Introduce akcipher class.
> > - Introduce ASN1 decoder into QEMU.
> > - Implement RSA backend by nettle/hogweed.
> > 
> > Lei He (4):
> >crypto-akcipher: Introduce akcipher types to qapi
> >crypto: add ASN.1 decoder
> >crypto: Implement RSA algorithm by hogweed
> >crypto: Implement RSA algorithm by gcrypt
> > 
> > Zhenwei Pi (3):
> >virtio-crypto: header update
> >crypto: Introduce akcipher crypto class
> >crypto: Introduce RSA algorithm
> > 
> > lei he (1):
> >tests/crypto: Add test suite for crypto akcipher
> > 
> >   backends/

Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare

2022-04-21 Thread Juergen Gross via Virtualization


On 20.04.22 22:44, Elliot Berman wrote:

From: Prakruthi Deepak Heragu 

During hotplug, the stolen time data structure is unmapped and memset.
There is a possibility of the timer IRQ being triggered before memset
and stolen time is getting updated as part of this timer IRQ handler. This
causes the below crash in timer handler -

   [ 3457.473139][C5] Unable to handle kernel paging request at virtual 
address ffc03df05148
   ...
   [ 3458.154398][C5] Call trace:
   [ 3458.157648][C5]  para_steal_clock+0x30/0x50
   [ 3458.162319][C5]  irqtime_account_process_tick+0x30/0x194
   [ 3458.168148][C5]  account_process_tick+0x3c/0x280
   [ 3458.173274][C5]  update_process_times+0x5c/0xf4
   [ 3458.178311][C5]  tick_sched_timer+0x180/0x384
   [ 3458.183164][C5]  __run_hrtimer+0x160/0x57c
   [ 3458.187744][C5]  hrtimer_interrupt+0x258/0x684
   [ 3458.192698][C5]  arch_timer_handler_virt+0x5c/0xa0
   [ 3458.198002][C5]  handle_percpu_devid_irq+0xdc/0x414
   [ 3458.203385][C5]  handle_domain_irq+0xa8/0x168
   [ 3458.208241][C5]  gic_handle_irq.34493+0x54/0x244
   [ 3458.213359][C5]  call_on_irq_stack+0x40/0x70
   [ 3458.218125][C5]  do_interrupt_handler+0x60/0x9c
   [ 3458.223156][C5]  el1_interrupt+0x34/0x64
   [ 3458.227560][C5]  el1h_64_irq_handler+0x1c/0x2c
   [ 3458.232503][C5]  el1h_64_irq+0x7c/0x80
   [ 3458.236736][C5]  free_vmap_area_noflush+0x108/0x39c
   [ 3458.242126][C5]  remove_vm_area+0xbc/0x118
   [ 3458.246714][C5]  vm_remove_mappings+0x48/0x2a4
   [ 3458.251656][C5]  __vunmap+0x154/0x278
   [ 3458.255796][C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
   [ 3458.261542][C5]  cpuhp_invoke_callback+0x248/0xc34
   [ 3458.266842][C5]  cpuhp_thread_fun+0x1c4/0x248
   [ 3458.271696][C5]  smpboot_thread_fn+0x1b0/0x400
   [ 3458.276638][C5]  kthread+0x17c/0x1e0
   [ 3458.280691][C5]  ret_from_fork+0x10/0x20

As a fix, disable the IRQs during hotplug until we unmap and memset the
stolen time structure.


This will work for the call chain of your observed crash, but are
you sure that para_steal_clock() can't be called from another cpu
concurrently?

In case you verified this can't happen, you can add my:

Reviewed-by: Juergen Gross 

Otherwise you either need to use RCU for doing the memunmap(), or a
lock to protect stolen_time_region.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] sched/headers: Fix compilation error with GCC 12

2022-04-21 Thread Peter Zijlstra

On Wed, Apr 20, 2022 at 11:45:05AM -0700, Kees Cook wrote:

> > -Wno-array-bounds
> 
> Please no; we just spent two years fixing all the old non-flexible array
> definitions and so many other things fixed for this to be enable because
> it finds actual flaws (but we turned it off when it was introduced
> because of how much sloppy old code we had).
> 
> > Is the obvious fix-all cure. The thing is, I want to hear if this new
> > warning has any actual use or is just crack induced madness like many of
> > the warnings we turn off.
> 
> Yes, it finds real flaws. And also yes, it is rather opinionated about
> some "tricks" that have worked in C, but frankly, most of those tricks
> end up being weird/accidentally-correct and aren't great for long-term
> readability or robustness. Though I'm not speaking specifically to this
> proposed patch; I haven't looked closely at it yet.

So the whole access outside object is UB thing in C is complete rubbish
from an OS perspective. The memory is there and there are geniune uses
for it.

And so far, the patches I've seen for it make the code actively worse.
So we need a sane annotation to tell the compiler to shut up already
without making the code an unreadable mess.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1] vdpa/vp_vdpa : add vdpa tool support in vp_vdpa

Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()

Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values

Re: [PATCH net-next] net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO

Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values

Re: [PATCH 2/5] hv_sock: Copy packets sent by Hyper-V out of the ring buffer

Re: [PATCH 1/5] hv_sock: Check hv_pkt_iter_first_raw()'s return value

RE: [PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback

[PATCH v3 5/5] virtio-crypto: enable retry for virtio-crypto-dev

[PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback

[PATCH v3 3/5] virtio-crypto: move helpers into virtio_crypto_common.c

[PATCH v3 2/5] virtio-crypto: wait ctrl queue instead of busy polling

[PATCH v3 1/5] virtio-crypto: use private buffer for control request

[PATCH v3 0/5] virtio-crypto: Improve performance

Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare

Re: PING: [PATCH v4 0/8] Introduce akcipher service for virtio-crypto

Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare

Re: [PATCH 1/3] sched/headers: Fix compilation error with GCC 12

18 matches

Site Navigation

Mail list logo

Footer information