Re: [PATCH v1] vdpa/vp_vdpa : add vdpa tool support in vp_vdpa
On Fri, Apr 22, 2022 at 1:57 PM Cindy Lu wrote: > > On Wed, Apr 20, 2022 at 2:27 PM Cindy Lu wrote: > > > > > > > > On Wed, Apr 20, 2022 at 11:21 AM Jason Wang wrote: > >> > >> > >> 在 2022/4/19 09:40, Cindy Lu 写道: > >> > this patch is to add the support for vdpa tool in vp_vdpa > >> > here is the example steps > >> > modprobe vp_vdpa > >> > modprobe vhost_vdpa > >> > > >> > echo :00:06.0>/sys/bus/pci/drivers/virtio-pci/unbind > >> > echo 1af4 1041 > /sys/bus/pci/drivers/vp-vdpa/new_id > >> > > >> > vdpa dev add name vdpa1 mgmtdev pci/:00:06.0 > >> > > >> > Signed-off-by: Cindy Lu > >> > --- > >> > drivers/vdpa/virtio_pci/vp_vdpa.c | 86 --- > >> > 1 file changed, 78 insertions(+), 8 deletions(-) > >> > > >> > diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c > >> > b/drivers/vdpa/virtio_pci/vp_vdpa.c > >> > index cce101e6a940..d8a1d2f8e9bb 100644 > >> > --- a/drivers/vdpa/virtio_pci/vp_vdpa.c > >> > +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c > >> > @@ -17,6 +17,8 @@ > >> > #include > >> > #include > >> > #include > >> > +#include > >> > +#include > >> > > >> > #define VP_VDPA_QUEUE_MAX 256 > >> > #define VP_VDPA_DRIVER_NAME "vp_vdpa" > >> > @@ -41,6 +43,18 @@ struct vp_vdpa { > >> > int vectors; > >> > }; > >> > > >> > +struct vp_vdpa_mgmtdev { > >> > + struct vdpa_mgmt_dev mgtdev; > >> > + struct vp_vdpa *vp_vdpa; > >> > + struct pci_dev *pdev; > >> > +}; > >> > + > >> > +#define VP_VDPA_NET_FEATURES > >> >\ > >> > + ((1ULL << VIRTIO_F_ANY_LAYOUT) | (1ULL << VIRTIO_F_VERSION_1) | > >> > \ > >> > + (1ULL << VIRTIO_F_ACCESS_PLATFORM)) > >> > + > >> > +#define VP_VDPA_NET_VQ_NUM 2 > >> > >> > >> Let's not go backwards, e.g we've already support any kind of virtio > >> device with any kind of features. see the comment in vp_vdpa_probe(). > >> > >> > > got it, I will fix this > >> > >> > + > >> > static struct vp_vdpa *vdpa_to_vp(struct vdpa_device *vdpa) > >> > { > >> > return container_of(vdpa, struct vp_vdpa, vdpa); > >> > @@ -454,9 +468,14 @@ static void vp_vdpa_free_irq_vectors(void *data) > >> > pci_free_irq_vectors(data); > >> > } > >> > > >> > -static int vp_vdpa_probe(struct pci_dev *pdev, const struct > >> > pci_device_id *id) > >> > +static int vp_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char > >> > *name, > >> > +const struct vdpa_dev_set_config *add_config) > >> > { > >> > + struct vp_vdpa_mgmtdev *vp_vdpa_mgtdev = > >> > + container_of(v_mdev, struct vp_vdpa_mgmtdev, mgtdev); > >> > + > >> > struct virtio_pci_modern_device *mdev; > >> > + struct pci_dev *pdev = vp_vdpa_mgtdev->pdev; > >> > struct device *dev = &pdev->dev; > >> > struct vp_vdpa *vp_vdpa; > >> > int ret, i; > >> > @@ -465,8 +484,9 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const > >> > struct pci_device_id *id) > >> > if (ret) > >> > return ret; > >> > > >> > - vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, > >> > - dev, &vp_vdpa_ops, NULL, false); > >> > + vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, dev, > >> > &vp_vdpa_ops, > >> > + name, false); > >> > >> > >> Nit: let's keep the line of vdpa_alloc_device() unchanged to reduce the > >> changeset. > >> > >> > > will fix this > >> > >> > + > >> > if (IS_ERR(vp_vdpa)) { > >> > dev_err(dev, "vp_vdpa: Failed to allocate vDPA > >> > structure\n"); > >> > return PTR_ERR(vp_vdpa); > >> > @@ -480,9 +500,10 @@ static int vp_vdpa_probe(struct pci_dev *pdev, > >> > const struct pci_device_id *id) > >> > dev_err(&pdev->dev, "Failed to probe modern PCI device\n"); > >> > goto err; > >> > } > >> > + vp_vdpa_mgtdev->vp_vdpa = vp_vdpa; > >> > > >> > pci_set_master(pdev); > >> > - pci_set_drvdata(pdev, vp_vdpa); > >> > + pci_set_drvdata(pdev, vp_vdpa_mgtdev); > >> > > >> > vp_vdpa->vdpa.dma_dev = &pdev->dev; > >> > vp_vdpa->queues = vp_modern_get_num_queues(mdev); > >> > @@ -516,7 +537,8 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const > >> > struct pci_device_id *id) > >> > } > >> > vp_vdpa->config_irq = VIRTIO_MSI_NO_VECTOR; > >> > > >> > - ret = vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues); > >> > + vp_vdpa->vdpa.mdev = &vp_vdpa_mgtdev->mgtdev; > >> > + ret = _vdpa_register_device(&vp_vdpa->vdpa, vp_vdpa->queues); > >> > if (ret) { > >> > dev_err(&pdev->dev, "Failed to register to vdpa bus\n"); > >> > goto err; > >> > @@ -529,12 +551,60 @@ static int vp_vdpa_probe(struct pci_dev *pdev, > >> > const struct pci_device_id *id) > >> > return ret; > >> > } > >> > > >> > -static void vp_vdpa_remove(struct pci_dev *pdev) > >> > +static void vp_vdpa_dev_del(struct vdpa_mgmt_dev
Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
On Thu, Apr 21, 2022 at 11:27:42AM +0200, Niklas Schnelle wrote: > On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote: > > On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote: > > > While determining the next PCI function is factored out of > > > pci_scan_slot() into next_fn() the former still handles the first > > > function as a special case duplicating the code from the scan loop and > > > splitting the condition that the first function exits from it being > > > multifunction which is tested in next_fn(). > > > > > > Furthermore the non ARI branch of next_fn() mixes the case that > > > multifunction devices may have non-contiguous function ranges and dev > > > may thus be NULL with the multifunction requirement. It also signals > > > that no further functions need to be scanned by returning 0 which is > > > a valid function number. > > > > > > Improve upon this by moving all conditions for having to scan for more > > > functions into next_fn() and make them obvious and commented. > > > > > > By changing next_fn() to return -ENODEV instead of 0 when there is no > > > next function we can then handle the initial function inside the loop > > > and deduplicate the shared handling. > > > > > > No functional change is intended. > > > > > > Signed-off-by: Niklas Schnelle > > > --- > > > drivers/pci/probe.c | 41 +++-- > > > 1 file changed, 19 insertions(+), 22 deletions(-) > > > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > > index 17a969942d37..389aa1f9cb2c 100644 > > > --- a/drivers/pci/probe.c > > > +++ b/drivers/pci/probe.c > > > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct > > > pci_bus *bus, int devfn) > > > } > > > EXPORT_SYMBOL(pci_scan_single_device); > > > > > > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev, > > > - unsigned int fn) > > > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn) > > > { > > > int pos; > > > u16 cap = 0; > > > unsigned int next_fn; > > > > > > - if (pci_ari_enabled(bus)) { > > > - if (!dev) > > > - return 0; > > > + if (dev && pci_ari_enabled(bus)) { > > > > I think this would be easier to verify if we kept the explicit error > > return, e.g., > > > > if (pci_ari_enabled(bus)) { > > if (!dev) > > return -ENODEV; > > pos = pci_find_ext_capability(...); > > > > Otherwise we have to sort through the !dev cases below. I guess > > -ENODEV would come from either the "!fn && !dev" case or the "fn > 6" > > case, but it's not obvious to me that those are equivalent to the > > previous code. > > We could keep this the same for this patch but I think for jailhouse > (patch 2) we need the "!dev" case not to fail here such that we can > handle the missing function 0 below even if ARI is enabled. For s390 > this doesn't currently matter because pci_ari_enabled(bus) is always > false but I assumed that this isn't necessarily so for jailhouse. I > sent a follow up mail on a slight behavior change I can think of for > this case for v2 but forgot to send it also for v3. Quoted below: I think it would be good to make the first patch change as little as possible to make it easier to analyze, then possibly test for hypervisor when changing this behavior. > > > - /* dev may be NULL for non-contiguous multifunction devices */ > > > - if (!dev || dev->multifunction) > > > - return (fn + 1) % 8; > > > - > > > - return 0; > > > + /* only multifunction devices may have more functions */ > > > + if (dev && !dev->multifunction) > > > + return -ENODEV; > > > > I don't understand why the "!dev || dev->multifunction" test needs to > > change. Isn't that valid even in the hypervisor case? IIUC, you want > > to return success in some cases that currently return failure, so this > > case that was already success should be fine as it was. > > This isn't a change to the test. It's the negation of the logical > condition *and* a switch of the branches i.e. keeps the overall > behavior exactly the same. The equivalence is !(!A || B) == (A && !B). I see the Boolean equivalence, but it's difficult to verify that the consequences are equivalent because the new code has the extra "!fn && !dev" test in the middle. > There are two reasons I did this. > > 1. I find (!dev || dev->multifunction) to be much harder to grasp than > (dev && !dev->multifunction). > > 2. The whole next_fn() in my opinion becomes easier to read if it bails > for all bad cases early and the "this is the next fn" is the final > return if we didn't bail. This becomes even more true as another > condition is added in patch 2. Fair enough, and I agree that "this is the next fn" is a nice final return. In general I think it's good to return either an error or the next fn as soon as it is known. It makes it harder to analyze if the return value has already been determined but we have to mentally pass o
Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values
On Thu, Apr 21, 2022 at 5:30 PM Andrea Parri wrote: > > > > @@ -577,12 +577,19 @@ static bool hvs_dgram_allow(u32 cid, u32 port) > > > static int hvs_update_recv_data(struct hvsock *hvs) > > > { > > > struct hvs_recv_buf *recv_buf; > > > - u32 payload_len; > > > + u32 pkt_len, payload_len; > > > + > > > + pkt_len = hv_pkt_len(hvs->recv_desc); > > > + > > > + /* Ensure the packet is big enough to read its header */ > > > + if (pkt_len < HVS_HEADER_LEN) > > > + return -EIO; > > > > > > recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1); > > > payload_len = recv_buf->hdr.data_size; > > > > > > - if (payload_len > HVS_MTU_SIZE) > > > + /* Ensure the packet is big enough to read its payload */ > > > + if (payload_len > pkt_len - HVS_HEADER_LEN || payload_len > > > > HVS_MTU_SIZE) > > > > checkpatch warns that we exceed 80 characters, I do not have a strong > > opinion on this, but if you have to resend better break the condition into 2 > > lines. > > Will break if preferred. (but does it really warn?? I understand that > the warning was deprecated and the "limit" increased to 100 chars...) I see the warn here: https://patchwork.kernel.org/project/netdevbpf/patch/20220420200720.434717-4-parri.and...@gmail.com/ in the kernel doc [1] we still say we prefer 80 columns, so I try to follow, especially when it doesn't make things worse. [1] https://docs.kernel.org/process/coding-style.html#breaking-long-lines-and-strings > > > > Maybe even update or remove the comment? (it only describes the first > > condition, but the conditions are pretty clear, so I don't think it adds > > much). > > Works for me. (taking it as this applies to the previous comment too.) Yep. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO
On Wed, Apr 20, 2022 at 10:32 PM Hangbin Liu wrote: > > On Wed, Apr 20, 2022 at 09:45:15AM -0400, Willem de Bruijn wrote: > > On Wed, Apr 20, 2022 at 4:28 AM Hangbin Liu wrote: > > > > > > Currently, the kernel drops GSO VLAN tagged packet if it's created with > > > socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr. > > > > > > The reason is AF_PACKET doesn't adjust the skb network header if there is > > > a VLAN tag. Then after virtio_net_hdr_set_proto() called, the > > > skb->protocol > > > will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb > > > is dropped as network header position is invalid. > > > > > > Let's handle VLAN packets by adjusting network header position in > > > packet_parse_headers(), and move the function in packet_snd() before > > > calling virtio_net_hdr_set_proto(). > > > > The network header is set in > > > > skb_reset_network_header(skb); > > > > err = -EINVAL; > > if (sock->type == SOCK_DGRAM) { > > offset = dev_hard_header(skb, dev, ntohs(proto), addr, > > NULL, len); > > if (unlikely(offset < 0)) > > goto out_free; > > } else if (reserve) { > > skb_reserve(skb, -reserve); > > if (len < reserve + sizeof(struct ipv6hdr) && > > dev->min_header_len != dev->hard_header_len) > > skb_reset_network_header(skb); > > } > > > > If all that is needed is to move the network header beyond an optional > > VLAN tag in the case of SOCK_RAW, then this can be done in the else > > for Ethernet packets. > > Before we set network position, we need to check the skb->protocol to make > sure it's a VLAN packet. > > If we set skb->protocol and adjust network header here, like > packet_parse_headers() does. How should we do with later > > skb->protocol = proto; > skb->dev = dev; > > settings? > > If you are afraid that skb_probe_transport_header(skb) would affect the > virtio_net_hdr operation. How about split the skb_probe_transport_header() > from packet_parse_headers()? Something like > > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -1924,13 +1924,19 @@ static int packet_rcv_spkt(struct sk_buff *skb, > struct net_device *dev, > > static void packet_parse_headers(struct sk_buff *skb, struct socket *sock) > { > + int depth; > + > if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) && > sock->type == SOCK_RAW) { > skb_reset_mac_header(skb); > skb->protocol = dev_parse_header_protocol(skb); > } > > - skb_probe_transport_header(skb); > + /* Move network header to the right position for VLAN tagged packets > */ > + if (likely(skb->dev->type == ARPHRD_ETHER) && > + eth_type_vlan(skb->protocol) && > + __vlan_get_protocol(skb, skb->protocol, &depth) != 0) > + skb_set_network_header(skb, depth); > } > > /* > @@ -3047,6 +3055,8 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > skb->mark = sockc.mark; > skb->tstamp = sockc.transmit_time; > > + packet_parse_headers(skb, sock); > + > if (has_vnet_hdr) { > err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le()); > if (err) > @@ -3055,7 +3065,7 @@ static int packet_snd(struct socket *sock, struct > msghdr *msg, size_t len) > virtio_net_hdr_set_proto(skb, &vnet_hdr); > } > > - packet_parse_headers(skb, sock); > + skb_probe_transport_header(skb); > > if (unlikely(extra_len == 4)) > skb->no_fcs = 1; > > > > > > It is not safe to increase reserve, as that would eat into the > > reserved hlen LL_RESERVED_SPACE(dev), which does not account for > > optional VLAN headers. > > > > Instead of setting here first, then patching up again later in > > packet_parse_headers. > > > > This change affects all packets with VLAN headers, not just those with > > GSO. I imagine that moving the network header is safe for all, but > > don't know that code well enough to verify that it does not have > > unintended side effects. Does dev_queue_xmit expect the network header > > to point to the start of the VLAN headers or after, for instance? > > I think adjust the network position should be safe, as tap device also did > that. Oh, it's great to be able to point to such prior art. Can you mention that in the commit message? Your approach does sound simpler than the above. Thanks for looking into that alternative, though. > > Thanks > Hangbin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values
On Wed, Apr 20, 2022 at 10:07:18PM +0200, Andrea Parri (Microsoft) wrote: For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest in the host-to-guest ring buffer. Ensure that invalid values cannot cause data being copied out of the bounds of the source buffer in hvs_stream_dequeue(). Signed-off-by: Andrea Parri (Microsoft) --- include/linux/hyperv.h | 5 + net/vmw_vsock/hyperv_transport.c | 11 +-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index fe2e0179ed51e..55478a6810b60 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1663,6 +1663,11 @@ static inline u32 hv_pkt_datalen(const struct vmpacket_descriptor *desc) return (desc->len8 << 3) - (desc->offset8 << 3); } +/* Get packet length associated with descriptor */ +static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc) +{ + return desc->len8 << 3; +} struct vmpacket_descriptor * hv_pkt_iter_first_raw(struct vmbus_channel *channel); diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 8c37d07017fc4..092cadc2c866d 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -577,12 +577,19 @@ static bool hvs_dgram_allow(u32 cid, u32 port) static int hvs_update_recv_data(struct hvsock *hvs) { struct hvs_recv_buf *recv_buf; - u32 payload_len; + u32 pkt_len, payload_len; + + pkt_len = hv_pkt_len(hvs->recv_desc); + + /* Ensure the packet is big enough to read its header */ + if (pkt_len < HVS_HEADER_LEN) + return -EIO; recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1); payload_len = recv_buf->hdr.data_size; - if (payload_len > HVS_MTU_SIZE) + /* Ensure the packet is big enough to read its payload */ + if (payload_len > pkt_len - HVS_HEADER_LEN || payload_len > HVS_MTU_SIZE) checkpatch warns that we exceed 80 characters, I do not have a strong opinion on this, but if you have to resend better break the condition into 2 lines. Maybe even update or remove the comment? (it only describes the first condition, but the conditions are pretty clear, so I don't think it adds much). Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/5] hv_sock: Copy packets sent by Hyper-V out of the ring buffer
On Wed, Apr 20, 2022 at 10:07:17PM +0200, Andrea Parri (Microsoft) wrote: Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver within the guest VM. Hyper-V can send packets with erroneous values or modify packet fields after they are processed by the guest. To defend against these scenarios, copy the incoming packet after validating its length and offset fields using hv_pkt_iter_{first,next}(). In this way, the packet can no longer be modified by the host. Signed-off-by: Andrea Parri (Microsoft) --- net/vmw_vsock/hyperv_transport.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 943352530936e..8c37d07017fc4 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -78,6 +78,9 @@ struct hvs_send_buf { ALIGN((payload_len), 8) + \ VMBUS_PKT_TRAILER_SIZE) +/* Upper bound on the size of a VMbus packet for hv_sock */ +#define HVS_MAX_PKT_SIZE HVS_PKT_LEN(HVS_MTU_SIZE) + union hvs_service_id { guid_t srv_id; @@ -378,6 +381,8 @@ static void hvs_open_connection(struct vmbus_channel *chan) rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE); } + chan->max_pkt_size = HVS_MAX_PKT_SIZE; + premise, I don't know HyperV channels :-( Is this change necessary to use hv_pkt_iter_first() instead of hv_pkt_iter_first_raw()? If yes, then please mention that you set this value in the commit message, otherwise maybe better to have a separate patch. Thanks, Stefano ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb, conn_from_host ? new : sk); if (ret != 0) { @@ -602,7 +607,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg, return -EOPNOTSUPP; if (need_refill) { - hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan); + hvs->recv_desc = hv_pkt_iter_first(hvs->chan); if (!hvs->recv_desc) return -ENOBUFS; ret = hvs_update_recv_data(hvs); @@ -618,7 +623,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg, hvs->recv_data_len -= to_read; if (hvs->recv_data_len == 0) { - hvs->recv_desc = hv_pkt_iter_next_raw(hvs->chan, hvs->recv_desc); + hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc); if (hvs->recv_desc) { ret = hvs_update_recv_data(hvs); if (ret) -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] hv_sock: Check hv_pkt_iter_first_raw()'s return value
On Wed, Apr 20, 2022 at 10:07:16PM +0200, Andrea Parri (Microsoft) wrote: The function returns NULL if the ring buffer doesn't contain enough readable bytes to constitute a packet descriptor. The ring buffer's write_index is in memory which is shared with the Hyper-V host, an erroneous or malicious host could thus change its value and overturn the result of hvs_stream_has_data(). Signed-off-by: Andrea Parri (Microsoft) --- net/vmw_vsock/hyperv_transport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index e111e13b66604..943352530936e 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -603,6 +603,8 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg, if (need_refill) { hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan); + if (!hvs->recv_desc) + return -ENOBUFS; ret = hvs_update_recv_data(hvs); if (ret) return ret; -- 2.25.1 Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback
> -Original Message- > From: zhenwei pi [mailto:pizhen...@bytedance.com] > Sent: Thursday, April 21, 2022 6:40 PM > To: Gonglei (Arei) ; m...@redhat.com > Cc: jasow...@redhat.com; herb...@gondor.apana.org.au; > linux-ker...@vger.kernel.org; virtualization@lists.linux-foundation.org; > linux-cry...@vger.kernel.org; helei.si...@bytedance.com; > da...@davemloft.net; zhenwei pi > Subject: [PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback > > From: lei he > > For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length of > returned result maybe less than akcipher_req->dst_len, we need to recalculate > the actual dst_len through the virt-queue protocol. > OK ... > Cc: Michael S. Tsirkin > Cc: Jason Wang > Cc: Gonglei > Signed-off-by: lei he > Signed-off-by: zhenwei pi > --- > drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > index 9561bc2df62b..82db86e088c2 100644 > --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > @@ -90,9 +90,12 @@ static void > virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request * > } > > akcipher_req = vc_akcipher_req->akcipher_req; > - if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) > + if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) { > + /* actuall length maybe less than dst buffer */ > + akcipher_req->dst_len = len - sizeof(vc_req->status); ...but why minus sizeof(vc_req->status)? Regards, -Gonglei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 5/5] virtio-crypto: enable retry for virtio-crypto-dev
From: lei he Enable retry for virtio-crypto-dev, so that crypto-engine can process cipher-requests parallelly. Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Gonglei Signed-off-by: lei he Signed-off-by: zhenwei pi --- drivers/crypto/virtio/virtio_crypto_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c index d8edefcb966c..5c0d68c9e894 100644 --- a/drivers/crypto/virtio/virtio_crypto_core.c +++ b/drivers/crypto/virtio/virtio_crypto_core.c @@ -62,7 +62,8 @@ static int virtcrypto_find_vqs(struct virtio_crypto *vi) spin_lock_init(&vi->data_vq[i].lock); vi->data_vq[i].vq = vqs[i]; /* Initialize crypto engine */ - vi->data_vq[i].engine = crypto_engine_alloc_init(dev, 1); + vi->data_vq[i].engine = crypto_engine_alloc_init_and_set(dev, true, NULL, 1, + virtqueue_get_vring_size(vqs[i])); if (!vi->data_vq[i].engine) { ret = -ENOMEM; goto err_engine; -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 4/5] virtio-crypto: adjust dst_len at ops callback
From: lei he For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length of returned result maybe less than akcipher_req->dst_len, we need to recalculate the actual dst_len through the virt-queue protocol. Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Gonglei Signed-off-by: lei he Signed-off-by: zhenwei pi --- drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c index 9561bc2df62b..82db86e088c2 100644 --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c @@ -90,9 +90,12 @@ static void virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request * } akcipher_req = vc_akcipher_req->akcipher_req; - if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) + if (vc_akcipher_req->opcode != VIRTIO_CRYPTO_AKCIPHER_VERIFY) { + /* actuall length maybe less than dst buffer */ + akcipher_req->dst_len = len - sizeof(vc_req->status); sg_copy_from_buffer(akcipher_req->dst, sg_nents(akcipher_req->dst), vc_akcipher_req->dst_buf, akcipher_req->dst_len); + } virtio_crypto_akcipher_finalize_req(vc_akcipher_req, akcipher_req, error); } -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 3/5] virtio-crypto: move helpers into virtio_crypto_common.c
Move virtcrypto_clear_request and virtcrypto_dataq_callback into virtio_crypto_common.c to make code clear. Then the xx_core.c supports: - probe/remove/irq affinity seting for a virtio device - basic virtio related operations xx_common.c supports: - common helpers/functions for algos Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Gonglei Signed-off-by: zhenwei pi --- drivers/crypto/virtio/virtio_crypto_common.c | 31 +++ drivers/crypto/virtio/virtio_crypto_common.h | 2 ++ drivers/crypto/virtio/virtio_crypto_core.c | 32 3 files changed, 33 insertions(+), 32 deletions(-) diff --git a/drivers/crypto/virtio/virtio_crypto_common.c b/drivers/crypto/virtio/virtio_crypto_common.c index 93df73c40dd3..4a23524896fe 100644 --- a/drivers/crypto/virtio/virtio_crypto_common.c +++ b/drivers/crypto/virtio/virtio_crypto_common.c @@ -8,6 +8,14 @@ #include "virtio_crypto_common.h" +void virtcrypto_clear_request(struct virtio_crypto_request *vc_req) +{ + if (vc_req) { + kfree_sensitive(vc_req->req_data); + kfree(vc_req->sgs); + } +} + static void virtio_crypto_ctrlq_callback(struct virtio_crypto_ctrl_request *vc_ctrl_req) { complete(&vc_ctrl_req->compl); @@ -59,3 +67,26 @@ void virtcrypto_ctrlq_callback(struct virtqueue *vq) } while (!virtqueue_enable_cb(vq)); spin_unlock_irqrestore(&vcrypto->ctrl_lock, flags); } + +void virtcrypto_dataq_callback(struct virtqueue *vq) +{ + struct virtio_crypto *vcrypto = vq->vdev->priv; + struct virtio_crypto_request *vc_req; + unsigned long flags; + unsigned int len; + unsigned int qid = vq->index; + + spin_lock_irqsave(&vcrypto->data_vq[qid].lock, flags); + do { + virtqueue_disable_cb(vq); + while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) { + spin_unlock_irqrestore( + &vcrypto->data_vq[qid].lock, flags); + if (vc_req->alg_cb) + vc_req->alg_cb(vc_req, len); + spin_lock_irqsave( + &vcrypto->data_vq[qid].lock, flags); + } + } while (!virtqueue_enable_cb(vq)); + spin_unlock_irqrestore(&vcrypto->data_vq[qid].lock, flags); +} diff --git a/drivers/crypto/virtio/virtio_crypto_common.h b/drivers/crypto/virtio/virtio_crypto_common.h index 25b4f22e8605..4d33ed5593d4 100644 --- a/drivers/crypto/virtio/virtio_crypto_common.h +++ b/drivers/crypto/virtio/virtio_crypto_common.h @@ -152,4 +152,6 @@ int virtio_crypto_ctrl_vq_request(struct virtio_crypto *vcrypto, struct scatterl unsigned int out_sgs, unsigned int in_sgs, struct virtio_crypto_ctrl_request *vc_ctrl_req); +void virtcrypto_dataq_callback(struct virtqueue *vq); + #endif /* _VIRTIO_CRYPTO_COMMON_H */ diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c index e668d4b1bc6a..d8edefcb966c 100644 --- a/drivers/crypto/virtio/virtio_crypto_core.c +++ b/drivers/crypto/virtio/virtio_crypto_core.c @@ -13,38 +13,6 @@ #include "virtio_crypto_common.h" -void -virtcrypto_clear_request(struct virtio_crypto_request *vc_req) -{ - if (vc_req) { - kfree_sensitive(vc_req->req_data); - kfree(vc_req->sgs); - } -} - -static void virtcrypto_dataq_callback(struct virtqueue *vq) -{ - struct virtio_crypto *vcrypto = vq->vdev->priv; - struct virtio_crypto_request *vc_req; - unsigned long flags; - unsigned int len; - unsigned int qid = vq->index; - - spin_lock_irqsave(&vcrypto->data_vq[qid].lock, flags); - do { - virtqueue_disable_cb(vq); - while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) { - spin_unlock_irqrestore( - &vcrypto->data_vq[qid].lock, flags); - if (vc_req->alg_cb) - vc_req->alg_cb(vc_req, len); - spin_lock_irqsave( - &vcrypto->data_vq[qid].lock, flags); - } - } while (!virtqueue_enable_cb(vq)); - spin_unlock_irqrestore(&vcrypto->data_vq[qid].lock, flags); -} - static int virtcrypto_find_vqs(struct virtio_crypto *vi) { vq_callback_t **callbacks; -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 2/5] virtio-crypto: wait ctrl queue instead of busy polling
Originally, after submitting request into virtio crypto control queue, the guest side polls the result from the virt queue. This works like following: CPU0 CPU1 ... CPUx CPUy | | | | \ \ / / \spin_lock(&vcrypto->ctrl_lock)---/ | virtqueue add & kick | busy poll virtqueue | spin_unlock(&vcrypto->ctrl_lock) ... There are two problems: 1, The queue depth is always 1, the performance of a virtio crypto device gets limited. Multi user processes share a single control queue, and hit spin lock race from control queue. Test on Intel Platinum 8260, a single worker gets ~35K/s create/close session operations, and 8 workers get ~40K/s operations with 800% CPU utilization. 2, The control request is supposed to get handled immediately, but in the current implementation of QEMU(v6.2), the vCPU thread kicks another thread to do this work, the latency also gets unstable. Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s: usecs : count distribution 0 -> 1 : 0|| 2 -> 3 : 7|| 4 -> 7 : 72 || 8 -> 15 : 186485 || 16 -> 31 : 687 || 32 -> 63 : 5|| 64 -> 127: 3|| 128 -> 255: 1|| 256 -> 511: 0|| 512 -> 1023 : 0|| 1024 -> 2047 : 0|| 2048 -> 4095 : 0|| 4096 -> 8191 : 0|| 8192 -> 16383 : 2|| This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us. To improve the performance of control queue, a request on control queue waits completion instead of busy polling to reduce lock racing, and gets completed by control queue callback. CPU0 CPU1 ... CPUx CPUy | | | | \ \ / / \spin_lock(&vcrypto->ctrl_lock)---/ | virtqueue add & kick | -spin_unlock(&vcrypto->ctrl_lock)-- / / \ \ | | | | wait wait wait wait Test this patch, the guest side get ~200K/s operations with 300% CPU utilization. Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Gonglei Signed-off-by: zhenwei pi --- drivers/crypto/virtio/virtio_crypto_common.c | 42 +++- drivers/crypto/virtio/virtio_crypto_common.h | 8 drivers/crypto/virtio/virtio_crypto_core.c | 2 +- 3 files changed, 41 insertions(+), 11 deletions(-) diff --git a/drivers/crypto/virtio/virtio_crypto_common.c b/drivers/crypto/virtio/virtio_crypto_common.c index e65125a74db2..93df73c40dd3 100644 --- a/drivers/crypto/virtio/virtio_crypto_common.c +++ b/drivers/crypto/virtio/virtio_crypto_common.c @@ -8,14 +8,21 @@ #include "virtio_crypto_common.h" +static void virtio_crypto_ctrlq_callback(struct virtio_crypto_ctrl_request *vc_ctrl_req) +{ + complete(&vc_ctrl_req->compl); +} + int virtio_crypto_ctrl_vq_request(struct virtio_crypto *vcrypto, struct scatterlist *sgs[], unsigned int out_sgs, unsigned int in_sgs, struct virtio_crypto_ctrl_request *vc_ctrl_req) { int err; - unsigned int inlen; unsigned long flags; + init_completion(&vc_ctrl_req->compl); + vc_ctrl_req->ctrl_cb = virtio_crypto_ctrlq_callback; + spin_lock_irqsave(&vcrypto->ctrl_lock, flags); err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, out_sgs, in_sgs, vc_ctrl_req, GFP_ATOMIC); if (err < 0) { @@ -24,16 +31,31 @@ int virtio_crypto_ctrl_vq_request(struct virtio_crypto *vcrypto, struct scatterl } virtqueue_kick(vcrypto->ctrl_vq); - - /* -* Trapping into the hypervisor, so the request should be -* handled immediately. -*/ - while (!virtqueue_get_buf(vcrypto->ctrl_vq, &inlen) && - !virtqueue_is_broken(vcrypto->ctrl_vq)) - cpu_relax(); - spin_unlock_irqrestore(&vcrypto->ctrl_lock, flags); + wait_for_completion(&vc_ctrl_req->comp
[PATCH v3 1/5] virtio-crypto: use private buffer for control request
Originally, all of the control requests share a single buffer( ctrl & input & ctrl_status fields in struct virtio_crypto), this allows queue depth 1 only, the performance of control queue gets limited by this design. In this patch, each request allocates request buffer dynamically, and free buffer after request, it's possible to optimize control queue depth in the next step. A necessary comment is already in code, still describe it again: /* * Note: there are padding fields in request, clear them to zero before * sending to host, * Ex, virtio_crypto_ctrl_request::ctrl::u::destroy_session::padding[48] */ So use kzalloc to allocate buffer of struct virtio_crypto_ctrl_request. Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Gonglei Signed-off-by: zhenwei pi --- drivers/crypto/virtio/Makefile| 1 + .../virtio/virtio_crypto_akcipher_algs.c | 90 ++-- drivers/crypto/virtio/virtio_crypto_common.c | 39 + drivers/crypto/virtio/virtio_crypto_common.h | 19 ++- .../virtio/virtio_crypto_skcipher_algs.c | 133 -- 5 files changed, 156 insertions(+), 126 deletions(-) create mode 100644 drivers/crypto/virtio/virtio_crypto_common.c diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile index bfa6cbae342e..49c1fa80e465 100644 --- a/drivers/crypto/virtio/Makefile +++ b/drivers/crypto/virtio/Makefile @@ -3,5 +3,6 @@ obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio_crypto.o virtio_crypto-objs := \ virtio_crypto_skcipher_algs.o \ virtio_crypto_akcipher_algs.o \ + virtio_crypto_common.o \ virtio_crypto_mgr.o \ virtio_crypto_core.o diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c index f3ec9420215e..9561bc2df62b 100644 --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c @@ -102,8 +102,8 @@ static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher { struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3]; struct virtio_crypto *vcrypto = ctx->vcrypto; + struct virtio_crypto_ctrl_request *vc_ctrl_req; uint8_t *pkey; - unsigned int inlen; int err; unsigned int num_out = 0, num_in = 0; @@ -111,98 +111,91 @@ static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher if (!pkey) return -ENOMEM; - spin_lock(&vcrypto->ctrl_lock); - memcpy(&vcrypto->ctrl.header, header, sizeof(vcrypto->ctrl.header)); - memcpy(&vcrypto->ctrl.u, para, sizeof(vcrypto->ctrl.u)); - vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR); + vc_ctrl_req = kzalloc(sizeof(*vc_ctrl_req), GFP_KERNEL); + if (!vc_ctrl_req) { + err = -ENOMEM; + goto out; + } - sg_init_one(&outhdr_sg, &vcrypto->ctrl, sizeof(vcrypto->ctrl)); + memcpy(&vc_ctrl_req->ctrl.header, header, sizeof(vc_ctrl_req->ctrl.header)); + memcpy(&vc_ctrl_req->ctrl.u, para, sizeof(vc_ctrl_req->ctrl.u)); + sg_init_one(&outhdr_sg, &vc_ctrl_req->ctrl, sizeof(vc_ctrl_req->ctrl)); sgs[num_out++] = &outhdr_sg; sg_init_one(&key_sg, pkey, keylen); sgs[num_out++] = &key_sg; - sg_init_one(&inhdr_sg, &vcrypto->input, sizeof(vcrypto->input)); + vc_ctrl_req->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR); + sg_init_one(&inhdr_sg, &vc_ctrl_req->input, sizeof(vc_ctrl_req->input)); sgs[num_out + num_in++] = &inhdr_sg; - err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out, num_in, vcrypto, GFP_ATOMIC); + err = virtio_crypto_ctrl_vq_request(vcrypto, sgs, num_out, num_in, vc_ctrl_req); if (err < 0) goto out; - virtqueue_kick(vcrypto->ctrl_vq); - while (!virtqueue_get_buf(vcrypto->ctrl_vq, &inlen) && - !virtqueue_is_broken(vcrypto->ctrl_vq)) - cpu_relax(); - - if (le32_to_cpu(vcrypto->input.status) != VIRTIO_CRYPTO_OK) { + if (le32_to_cpu(vc_ctrl_req->input.status) != VIRTIO_CRYPTO_OK) { + pr_err("virtio_crypto: Create session failed status: %u\n", + le32_to_cpu(vc_ctrl_req->input.status)); err = -EINVAL; goto out; } - ctx->session_id = le64_to_cpu(vcrypto->input.session_id); + ctx->session_id = le64_to_cpu(vc_ctrl_req->input.session_id); ctx->session_valid = true; err = 0; out: - spin_unlock(&vcrypto->ctrl_lock); + kfree(vc_ctrl_req); kfree_sensitive(pkey); - if (err < 0) - pr_err("virtio_crypto: Create session failed status: %u\n", - le32_to_cpu(vcrypto->input.status)); - return err; } static int virtio_crypto_alg_akcipher_close_session(struct virtio_crypto_akcipher_ctx *ctx) { struct scatterlist outhdr_sg, inhdr_s
[PATCH v3 0/5] virtio-crypto: Improve performance
v2 -> v3: - Jason suggested that spliting the first patch into two part: 1, using private buffer 2, remove the busy polling Rework as Jason's suggestion, this makes the smaller change in each one and clear. v1 -> v2: - Use kfree instead of kfree_sensitive for insensitive buffer. - Several coding style fix. - Use memory from current node, instead of memory close to device - Add more message in commit, also explain why removing per-device request buffer. - Add necessary comment in code to explain why using kzalloc to allocate struct virtio_crypto_ctrl_request. v1: The main point of this series is to improve the performance for virtio crypto: - Use wait mechanism instead of busy polling for ctrl queue, this reduces CPU and lock racing, it's possiable to create/destroy session parallelly, QPS increases from ~40K/s to ~200K/s. - Enable retry on crypto engine to improve performance for data queue, this allows the larger depth instead of 1. - Fix dst data length in akcipher service. - Other style fix. lei he (2): virtio-crypto: adjust dst_len at ops callback virtio-crypto: enable retry for virtio-crypto-dev zhenwei pi (3): virtio-crypto: use private buffer for control request virtio-crypto: wait ctrl queue instead of busy polling virtio-crypto: move helpers into virtio_crypto_common.c drivers/crypto/virtio/Makefile| 1 + .../virtio/virtio_crypto_akcipher_algs.c | 95 ++--- drivers/crypto/virtio/virtio_crypto_common.c | 92 drivers/crypto/virtio/virtio_crypto_common.h | 29 +++- drivers/crypto/virtio/virtio_crypto_core.c| 37 + .../virtio/virtio_crypto_skcipher_algs.c | 133 -- 6 files changed, 226 insertions(+), 161 deletions(-) create mode 100644 drivers/crypto/virtio/virtio_crypto_common.c -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
On Thu, Apr 21, 2022 at 09:44:28AM +0200, Juergen Gross wrote: > On 20.04.22 22:44, Elliot Berman wrote: > > From: Prakruthi Deepak Heragu > > > > During hotplug, the stolen time data structure is unmapped and memset. > > There is a possibility of the timer IRQ being triggered before memset > > and stolen time is getting updated as part of this timer IRQ handler. This > > causes the below crash in timer handler - > > > >[ 3457.473139][C5] Unable to handle kernel paging request at virtual > > address ffc03df05148 > >... > >[ 3458.154398][C5] Call trace: > >[ 3458.157648][C5] para_steal_clock+0x30/0x50 > >[ 3458.162319][C5] irqtime_account_process_tick+0x30/0x194 > >[ 3458.168148][C5] account_process_tick+0x3c/0x280 > >[ 3458.173274][C5] update_process_times+0x5c/0xf4 > >[ 3458.178311][C5] tick_sched_timer+0x180/0x384 > >[ 3458.183164][C5] __run_hrtimer+0x160/0x57c > >[ 3458.187744][C5] hrtimer_interrupt+0x258/0x684 > >[ 3458.192698][C5] arch_timer_handler_virt+0x5c/0xa0 > >[ 3458.198002][C5] handle_percpu_devid_irq+0xdc/0x414 > >[ 3458.203385][C5] handle_domain_irq+0xa8/0x168 > >[ 3458.208241][C5] gic_handle_irq.34493+0x54/0x244 > >[ 3458.213359][C5] call_on_irq_stack+0x40/0x70 > >[ 3458.218125][C5] do_interrupt_handler+0x60/0x9c > >[ 3458.223156][C5] el1_interrupt+0x34/0x64 > >[ 3458.227560][C5] el1h_64_irq_handler+0x1c/0x2c > >[ 3458.232503][C5] el1h_64_irq+0x7c/0x80 > >[ 3458.236736][C5] free_vmap_area_noflush+0x108/0x39c > >[ 3458.242126][C5] remove_vm_area+0xbc/0x118 > >[ 3458.246714][C5] vm_remove_mappings+0x48/0x2a4 > >[ 3458.251656][C5] __vunmap+0x154/0x278 > >[ 3458.255796][C5] stolen_time_cpu_down_prepare+0xc0/0xd8 > >[ 3458.261542][C5] cpuhp_invoke_callback+0x248/0xc34 > >[ 3458.266842][C5] cpuhp_thread_fun+0x1c4/0x248 > >[ 3458.271696][C5] smpboot_thread_fn+0x1b0/0x400 > >[ 3458.276638][C5] kthread+0x17c/0x1e0 > >[ 3458.280691][C5] ret_from_fork+0x10/0x20 > > > > As a fix, disable the IRQs during hotplug until we unmap and memset the > > stolen time structure. > > This will work for the call chain of your observed crash, but are > you sure that para_steal_clock() can't be called from another cpu > concurrently? Agreed, this needs checking as update_rq_clock() is called from all over the place. > In case you verified this can't happen, you can add my: > > Reviewed-by: Juergen Gross > > Otherwise you either need to use RCU for doing the memunmap(), or a > lock to protect stolen_time_region. Yes, I think RCU would make a lot of sense here, deferring the memunmap until there are no longer any readers. The reader is currently doing: if (!reg->kaddr) return 0; return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time)); so we'd also want an rcu_dereference() on reg->kaddr to avoid reading it twice. Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: PING: [PATCH v4 0/8] Introduce akcipher service for virtio-crypto
On Thu, Apr 21, 2022 at 09:41:40AM +0800, zhenwei pi wrote: > Hi Daniel, > Could you please review this series? Yes, its on my to do. I've been on holiday for 2 weeks, so still catching up on the backlog of reviews. > On 4/11/22 18:43, zhenwei pi wrote: > > v3 -> v4: > > - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa -> RSA, > > XXX-alg -> XXX-algo. > > - Change version info in qapi/crypto.json, from 7.0 -> 7.1. > > - Remove ecdsa from qapi/crypto.json, it would be introduced with the > > implemetion later. > > - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed) in > > qapi/crypto.json. > > - Rename arguments of qcrypto_akcipher_XXX to keep aligned with > > qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add > > qcrypto_akcipher_max_XXX APIs. > > - Add new API: qcrypto_akcipher_supports. > > - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions > > return the actual length of result. > > - Separate ASN.1 source code and test case clean. > > - Disable RSA raw encoding for akcipher-nettle. > > - Separate RSA key parser into rsakey.{hc}, and implememts it with > > builtin-asn1-decoder and nettle respectivly. > > - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has > > higher priority than nettle. > > - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length > > of returned result maybe less than the dst buffer size, return the actual > > length of result instead of the buffer length to the guest side. (in > > function virtio_crypto_akcipher_input_data_helper) > > - Other minor changes. > > > > Thanks to Daniel! > > > > Eric pointed out this missing part of use case, send it here again. > > > > In our plan, the feature is designed for HTTPS offloading case and other > > applications which use kernel RSA/ecdsa by keyctl syscall. The full picture > > shows bellow: > > > > > >Nginx/openssl[1] ... Apps > > Guest - > > virtio-crypto driver[2] > > - > > virtio-crypto backend[3] > > Host- > >/ | \ > >builtin[4] vhost keyctl[5] ... > > > > > > [1] User applications can offload RSA calculation to kernel by keyctl > > syscall. There is no keyctl engine in openssl currently, we developed a > > engine and tried to contribute it to openssl upstream, but openssl 1.x does > > not accept new feature. Link: > > https://github.com/openssl/openssl/pull/16689 > > > > This branch is available and maintained by Lei > > https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine > > > > We tested nginx(change config file only) with openssl keyctl engine, it > > works fine. > > > > [2] virtio-crypto driver is used to communicate with host side, send > > requests to host side to do asymmetric calculation. > > https://lkml.org/lkml/2022/3/1/1425 > > > > [3] virtio-crypto backend handles requests from guest side, and forwards > > request to crypto backend driver of QEMU. > > > > [4] Currently RSA is supported only in builtin driver. This driver is > > supposed to test the full feature without other software(Ex vhost process) > > and hardware dependence. ecdsa is introduced into qapi type without > > implementation, this may be implemented in Q3-2022 or later. If ecdsa type > > definition should be added with the implementation together, I'll remove > > this in next version. > > > > [5] keyctl backend is in development, we will post this feature in Q2-2022. > > keyctl backend can use hardware acceleration(Ex, Intel QAT). > > > > Setup the full environment, tested with Intel QAT on host side, the QPS of > > HTTPS increase to ~200% in a guest. > > > > VS PCI passthrough: the most important benefit of this solution makes the > > VM migratable. > > > > v2 -> v3: > > - Introduce akcipher types to qapi > > - Add test/benchmark suite for akcipher class > > - Seperate 'virtio_crypto: Support virtio crypto asym operation' into: > >- crypto: Introduce akcipher crypto class > >- virtio-crypto: Introduce RSA algorithm > > > > v1 -> v2: > > - Update virtio_crypto.h from v2 version of related kernel patch. > > > > v1: > > - Support akcipher for virtio-crypto. > > - Introduce akcipher class. > > - Introduce ASN1 decoder into QEMU. > > - Implement RSA backend by nettle/hogweed. > > > > Lei He (4): > >crypto-akcipher: Introduce akcipher types to qapi > >crypto: add ASN.1 decoder > >crypto: Implement RSA algorithm by hogweed > >crypto: Implement RSA algorithm by gcrypt > > > > Zhenwei Pi (3): > >virtio-crypto: header update > >crypto: Introduce akcipher crypto class > >crypto: Introduce RSA algorithm > > > > lei he (1): > >tests/crypto: Add test suite for crypto akcipher > > > > backends/
Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
On 20.04.22 22:44, Elliot Berman wrote: From: Prakruthi Deepak Heragu During hotplug, the stolen time data structure is unmapped and memset. There is a possibility of the timer IRQ being triggered before memset and stolen time is getting updated as part of this timer IRQ handler. This causes the below crash in timer handler - [ 3457.473139][C5] Unable to handle kernel paging request at virtual address ffc03df05148 ... [ 3458.154398][C5] Call trace: [ 3458.157648][C5] para_steal_clock+0x30/0x50 [ 3458.162319][C5] irqtime_account_process_tick+0x30/0x194 [ 3458.168148][C5] account_process_tick+0x3c/0x280 [ 3458.173274][C5] update_process_times+0x5c/0xf4 [ 3458.178311][C5] tick_sched_timer+0x180/0x384 [ 3458.183164][C5] __run_hrtimer+0x160/0x57c [ 3458.187744][C5] hrtimer_interrupt+0x258/0x684 [ 3458.192698][C5] arch_timer_handler_virt+0x5c/0xa0 [ 3458.198002][C5] handle_percpu_devid_irq+0xdc/0x414 [ 3458.203385][C5] handle_domain_irq+0xa8/0x168 [ 3458.208241][C5] gic_handle_irq.34493+0x54/0x244 [ 3458.213359][C5] call_on_irq_stack+0x40/0x70 [ 3458.218125][C5] do_interrupt_handler+0x60/0x9c [ 3458.223156][C5] el1_interrupt+0x34/0x64 [ 3458.227560][C5] el1h_64_irq_handler+0x1c/0x2c [ 3458.232503][C5] el1h_64_irq+0x7c/0x80 [ 3458.236736][C5] free_vmap_area_noflush+0x108/0x39c [ 3458.242126][C5] remove_vm_area+0xbc/0x118 [ 3458.246714][C5] vm_remove_mappings+0x48/0x2a4 [ 3458.251656][C5] __vunmap+0x154/0x278 [ 3458.255796][C5] stolen_time_cpu_down_prepare+0xc0/0xd8 [ 3458.261542][C5] cpuhp_invoke_callback+0x248/0xc34 [ 3458.266842][C5] cpuhp_thread_fun+0x1c4/0x248 [ 3458.271696][C5] smpboot_thread_fn+0x1b0/0x400 [ 3458.276638][C5] kthread+0x17c/0x1e0 [ 3458.280691][C5] ret_from_fork+0x10/0x20 As a fix, disable the IRQs during hotplug until we unmap and memset the stolen time structure. This will work for the call chain of your observed crash, but are you sure that para_steal_clock() can't be called from another cpu concurrently? In case you verified this can't happen, you can add my: Reviewed-by: Juergen Gross Otherwise you either need to use RCU for doing the memunmap(), or a lock to protect stolen_time_region. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] sched/headers: Fix compilation error with GCC 12
On Wed, Apr 20, 2022 at 11:45:05AM -0700, Kees Cook wrote: > > -Wno-array-bounds > > Please no; we just spent two years fixing all the old non-flexible array > definitions and so many other things fixed for this to be enable because > it finds actual flaws (but we turned it off when it was introduced > because of how much sloppy old code we had). > > > Is the obvious fix-all cure. The thing is, I want to hear if this new > > warning has any actual use or is just crack induced madness like many of > > the warnings we turn off. > > Yes, it finds real flaws. And also yes, it is rather opinionated about > some "tricks" that have worked in C, but frankly, most of those tricks > end up being weird/accidentally-correct and aren't great for long-term > readability or robustness. Though I'm not speaking specifically to this > proposed patch; I haven't looked closely at it yet. So the whole access outside object is UB thing in C is complete rubbish from an OS perspective. The memory is there and there are geniune uses for it. And so far, the patches I've seen for it make the code actively worse. So we need a sane annotation to tell the compiler to shut up already without making the code an unreadable mess. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization