Re: [PATCH vhost v2 4/8] vdpa/mlx5: Mark vq addrs for modification in hw vq
On 12/12/2023 11:21 AM, Eugenio Perez Martin wrote: On Tue, Dec 5, 2023 at 11:46 AM Dragos Tatulea wrote: Addresses get set by .set_vq_address. hw vq addresses will be updated on next modify_virtqueue. Signed-off-by: Dragos Tatulea Reviewed-by: Gal Pressman Acked-by: Eugenio Pérez I'm kind of ok with this patch and the next one about state, but I didn't ack them in the previous series. My main concern is that it is not valid to change the vq address after DRIVER_OK in VirtIO, which vDPA follows. Only memory maps are ok to change at this moment. I'm not sure about vq state in vDPA, but vhost forbids changing it with an active backend. Suspend is not defined in VirtIO at this moment though, so maybe it is ok to decide that all of these parameters may change during suspend. Maybe the best thing is to protect this with a vDPA feature flag. I think protect with vDPA feature flag could work, while on the other hand vDPA means vendor specific optimization is possible around suspend and resume (in case it helps performance), which doesn't have to be backed by virtio spec. Same applies to vhost user backend features, variations there were not backed by spec either. Of course, we should try best to make the default behavior backward compatible with virtio-based backend, but that circles back to no suspend definition in the current virtio spec, for which I hope we don't cease development on vDPA indefinitely. After all, the virtio based vdap backend can well define its own feature flag to describe (minor difference in) the suspend behavior based on the later spec once it is formed in future. Regards, -Siwei Jason, what do you think? Thanks! --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 9 + include/linux/mlx5/mlx5_ifc_vdpa.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index f8f088cced50..80e066de0866 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1209,6 +1209,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, bool state_change = false; void *obj_context; void *cmd_hdr; + void *vq_ctx; void *in; int err; @@ -1230,6 +1231,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid); obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context); + vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) { if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) { @@ -1241,6 +1243,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, state_change = true; } + if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) { + MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr); + MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr); + MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr); + } + MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields); err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out)); if (err) @@ -2202,6 +2210,7 @@ static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_ mvq->desc_addr = desc_area; mvq->device_addr = device_area; mvq->driver_addr = driver_area; + mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS; return 0; } diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h index b86d51a855f6..9594ac405740 100644 --- a/include/linux/mlx5/mlx5_ifc_vdpa.h +++ b/include/linux/mlx5/mlx5_ifc_vdpa.h @@ -145,6 +145,7 @@ enum { MLX5_VIRTQ_MODIFY_MASK_STATE= (u64)1 << 0, MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS = (u64)1 << 3, MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4, + MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS = (u64)1 << 6, MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14, }; -- 2.42.0
Re: [PATCH 1/1] video: hyperv_fb: Add ratelimit on error message
On Tue, Apr 20, 2021 at 08:44:19AM -0700, Michael Kelley wrote: > Due to a full ring buffer, the driver may be unable to send updates to > the Hyper-V host. But outputing the error message can make the problem > worse because console output is also typically written to the frame > buffer. As a result, in some circumstances the error message is output > continuously. > > Break the cycle by rate limiting the error message. Also output > the error code for additional diagnosability. > > Signed-off-by: Michael Kelley Applied to hyperv-next. Thanks. > --- > drivers/video/fbdev/hyperv_fb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c > index 4dc9077..a7e6eea 100644 > --- a/drivers/video/fbdev/hyperv_fb.c > +++ b/drivers/video/fbdev/hyperv_fb.c > @@ -308,7 +308,7 @@ static inline int synthvid_send(struct hv_device *hdev, > VM_PKT_DATA_INBAND, 0); > > if (ret) > - pr_err("Unable to send packet via vmbus\n"); > + pr_err_ratelimited("Unable to send packet via vmbus; error > %d\n", ret); > > return ret; > } > -- > 1.8.3.1 >
Re: ** POTENTIAL FRAUD ALERT - RED HAT ** [PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload
On Tue, Apr 20, 2021 at 11:31:54AM +0200, Vitaly Kuznetsov wrote: > Michael Kelley writes: > > > When running in Azure, disks may be connected to a Linux VM with > > read/write caching enabled. If a VM panics and issues a VMbus > > UNLOAD request to Hyper-V, the response is delayed until all dirty > > data in the disk cache is flushed. In extreme cases, this flushing > > can take 10's of seconds, depending on the disk speed and the amount > > of dirty data. If kdump is configured for the VM, the current 10 second > > timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD > > complete message may arrive well after the kdump kernel is already > > running, causing problems. Note that no problem occurs if kdump is > > not enabled because Hyper-V waits for the cache flush before doing > > a reboot through the BIOS/UEFI code. > > > > Fix this problem by increasing the timeout in vmbus_wait_for_unload() > > to 100 seconds. Also output periodic messages so that if anyone is > > watching the serial console, they won't think the VM is completely > > hung. > > > > Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to > > vmbus_wait_for_unload") > > Signed-off-by: Michael Kelley Applied to hyperv-next. Thanks. > > --- [...] > > > > +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ > > +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ > > +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) > > +#define UNLOAD_MSG_MS (5*1000)/* Every 5 seconds */ > > +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) > > + > > static void vmbus_wait_for_unload(void) > > { > > int cpu; > > @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void) > > * vmbus_connection.unload_event. If not, the last thing we can do is > > * read message pages for all CPUs directly. > > * > > -* Wait no more than 10 seconds so that the panic path can't get > > -* hung forever in case the response message isn't seen. > > +* Wait up to 100 seconds since an Azure host must writeback any dirty > > +* data in its disk cache before the VMbus UNLOAD request will > > +* complete. This flushing has been empirically observed to take up > > +* to 50 seconds in cases with a lot of dirty data, so allow additional > > +* leeway and for inaccuracies in mdelay(). But eventually time out so > > +* that the panic path can't get hung forever in case the response > > +* message isn't seen. > > I vaguely remember debugging cases when CHANNELMSG_UNLOAD_RESPONSE never > arrives, it was kind of pointless to proceed to kexec as attempts to > reconnect Vmbus devices were failing (no devices were offered after > CHANNELMSG_REQUESTOFFERS AFAIR). Would it maybe make sense to just do > emergency reboot instead of proceeding to kexec when this happens? Just > wondering. > Please submit a follow-up patch if necessary. Wei.
Re: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically
On Tue, Apr 20, 2021 at 04:50:56AM +, Michael Kelley wrote: > From: Andrea Parri (Microsoft) Sent: Monday, April > 19, 2021 6:44 PM > > > > If a malicious or compromised Hyper-V sends a spurious message of type > > CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will > > call complete() on an uninitialized event, and cause an oops. > > > > Reported-by: Michael Kelley > > Signed-off-by: Andrea Parri (Microsoft) > > --- > > Changes since v1[1]: > > - add inline comment in vmbus_unload_response() > > > > [1] > > https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/ > > > > drivers/hv/channel_mgmt.c | 7 ++- > > drivers/hv/connection.c | 2 ++ > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > Reviewed-by: Michael Kelley Applied to hyperv-next. Thanks.
Re: [PATCH 1/2] x86/hyperv: Move hv_do_rep_hypercall to asm-generic
On Sun, Apr 18, 2021 at 02:42:55PM +, Michael Kelley wrote: > From: Wei Liu Sent: Sunday, April 18, 2021 6:09 AM > > On Fri, Apr 16, 2021 at 05:43:02PM -0700, Joseph Salisbury wrote: > > > From: Joseph Salisbury > > > > > > This patch makes no functional changes. It simply moves > > > hv_do_rep_hypercall() > > > out of arch/x86/include/asm/mshyperv.h and into asm-generic/mshyperv.h > > > > > > hv_do_rep_hypercall() is architecture independent, so it makes sense that > > > it > > > should be in the architecture independent mshyperv.h, not in the > > > x86-specific > > > mshyperv.h. > > > > > > This is done in preperation for a follow up patch which creates a > > > consistent > > > pattern for checking Hyper-V hypercall status. > > > > > > Signed-off-by: Joseph Salisbury > > > --- > > [...] > > > +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 > > > varhead_size, > > > + void *input, void *output) > > > +{ > > > + u64 control = code; > > > + u64 status; > > > + u16 rep_comp; > > > + > > > + control |= (u64)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET; > > > + control |= (u64)rep_count << HV_HYPERCALL_REP_COMP_OFFSET; > > > + > > > + do { > > > + status = hv_do_hypercall(control, input, output); > > > + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) > > > + return status; > > > + > > > + /* Bits 32-43 of status have 'Reps completed' data. */ > > > + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >> > > > + HV_HYPERCALL_REP_COMP_OFFSET; > > > + > > > + control &= ~HV_HYPERCALL_REP_START_MASK; > > > + control |= (u64)rep_comp << HV_HYPERCALL_REP_START_OFFSET; > > > + > > > + touch_nmi_watchdog(); > > > > This seems to be missing in Arm. Does it compile? > > > > Wei. > > touch_nmi_watchdog() is defined as "static inline" in include/linux/nmi.h. So > it should be present in all architectures. It calls arch_touch_nmi_watchdog, > which is an empty function if CONFIG_HAVE_NMI_WATCHDOG is not defined, > as is the case on ARM64. I see. I couldn't find arch_touch_nmi_watchdog on Arm but missed the stub function. Thanks for the information. Wei.
Re: [PATCH 1/2] x86/hyperv: Move hv_do_rep_hypercall to asm-generic
On Fri, Apr 16, 2021 at 05:43:02PM -0700, Joseph Salisbury wrote: > From: Joseph Salisbury > > This patch makes no functional changes. It simply moves hv_do_rep_hypercall() > out of arch/x86/include/asm/mshyperv.h and into asm-generic/mshyperv.h > > hv_do_rep_hypercall() is architecture independent, so it makes sense that it > should be in the architecture independent mshyperv.h, not in the x86-specific > mshyperv.h. > > This is done in preperation for a follow up patch which creates a consistent > pattern for checking Hyper-V hypercall status. > > Signed-off-by: Joseph Salisbury > --- [...] > +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 > varhead_size, > + void *input, void *output) > +{ > + u64 control = code; > + u64 status; > + u16 rep_comp; > + > + control |= (u64)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET; > + control |= (u64)rep_count << HV_HYPERCALL_REP_COMP_OFFSET; > + > + do { > + status = hv_do_hypercall(control, input, output); > + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) > + return status; > + > + /* Bits 32-43 of status have 'Reps completed' data. */ > + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >> > + HV_HYPERCALL_REP_COMP_OFFSET; > + > + control &= ~HV_HYPERCALL_REP_START_MASK; > + control |= (u64)rep_comp << HV_HYPERCALL_REP_START_OFFSET; > + > + touch_nmi_watchdog(); This seems to be missing in Arm. Does it compile? Wei. > + } while (rep_comp < rep_count); > + > + return status; > +} > > /* Generate the guest OS identifier as described in the Hyper-V TLFS */ > static inline __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version, > -- > 2.17.1 >
Re: [PATCH v3 0/3] Drivers: hv: vmbus: Introduce CHANNELMSG_MODIFYCHANNEL_RESPONSE
On Fri, Apr 16, 2021 at 04:34:46PM +0200, Andrea Parri (Microsoft) wrote: > Changes since v2[1]: > - fix VMbus protocol version name > - add Reviewed-by: tag > - refactor/simplyfy changes in hv_synic_cleanup() > > Changes since v1[2]: > - rebase on hyperv-next > - split changes into three patches > - fix error handling in send_modifychannel_with_ack() > - remove rescind checks in send_modifychannel_with_ack() > - remove unneeded test in hv_synic_event_pending() > - add/amend inline comments > - style changes > > [1] https://lkml.kernel.org/r/20210414150118.2843-1-parri.and...@gmail.com > [2] https://lkml.kernel.org/r/20201126191210.13115-1-parri.and...@gmail.com > > Andrea Parri (Microsoft) (3): > Drivers: hv: vmbus: Introduce and negotiate VMBus protocol version 5.3 > Drivers: hv: vmbus: Drivers: hv: vmbus: Introduce > CHANNELMSG_MODIFYCHANNEL_RESPONSE > Drivers: hv: vmbus: Check for pending channel interrupts before taking > a CPU offline Applied to hyperv-next. Thanks. Wei.
Re: [PATCH] Drivers: hv: vmbus: Use after free in __vmbus_open()
On Tue, Apr 13, 2021 at 05:42:21PM +0200, Andrea Parri wrote: > On Tue, Apr 13, 2021 at 01:50:04PM +0300, Dan Carpenter wrote: > > The "open_info" variable is added to the _connection.chn_msg_list, > > but the error handling frees "open_info" without removing it from the > > list. This will result in a use after free. First remove it from the > > list, and then free it. > > > > Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues") > > Signed-off-by: Dan Carpenter > > I had this 'queued' in my list, > > Reviewed-by: Andrea Parri Applied. Thanks.
Re: [PATCH RFC 01/22] asm-generic/hyperv: add HV_STATUS_ACCESS_DENIED definition
On Thu, Apr 15, 2021 at 05:33:17PM +0200, Vitaly Kuznetsov wrote: > Wei Liu writes: > > > On Tue, Apr 13, 2021 at 02:26:09PM +0200, Vitaly Kuznetsov wrote: > >> From TLFSv6.0b, this status means: "The caller did not possess sufficient > >> access rights to perform the requested operation." > >> > >> Signed-off-by: Vitaly Kuznetsov > > > > This can be applied to hyperv-next right away. Let me know what you > > think. > > > > In case there's no immediate need for this constant outside of KVM, I'd > suggest you just give Paolo your 'Acked-by' so I can carry the patch in > the series for the time being. This will eliminate the need to track > dependencies between hyperv-next and kvm-next. Acked-by: Wei Liu
Re: [PATCH RFC 01/22] asm-generic/hyperv: add HV_STATUS_ACCESS_DENIED definition
On Tue, Apr 13, 2021 at 02:26:09PM +0200, Vitaly Kuznetsov wrote: > From TLFSv6.0b, this status means: "The caller did not possess sufficient > access rights to perform the requested operation." > > Signed-off-by: Vitaly Kuznetsov This can be applied to hyperv-next right away. Let me know what you think. Wei. > --- > include/asm-generic/hyperv-tlfs.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/asm-generic/hyperv-tlfs.h > b/include/asm-generic/hyperv-tlfs.h > index 83448e837ded..e01a3bade13a 100644 > --- a/include/asm-generic/hyperv-tlfs.h > +++ b/include/asm-generic/hyperv-tlfs.h > @@ -187,6 +187,7 @@ enum HV_GENERIC_SET_FORMAT { > #define HV_STATUS_INVALID_HYPERCALL_INPUT3 > #define HV_STATUS_INVALID_ALIGNMENT 4 > #define HV_STATUS_INVALID_PARAMETER 5 > +#define HV_STATUS_ACCESS_DENIED 6 > #define HV_STATUS_OPERATION_DENIED 8 > #define HV_STATUS_INSUFFICIENT_MEMORY11 > #define HV_STATUS_INVALID_PORT_ID17 > -- > 2.30.2 >
Re: [PATCH] Drivers: hv: vmbus: remove unused function
On Wed, Apr 14, 2021 at 02:48:17PM +, Michael Kelley wrote: > From: Jiapeng Chong Sent: Tuesday, April > 13, 2021 11:21 PM [...] > This function became unused as of commit 4226ff69a3df > ("vmbus: simplify hv_ringbuffer_read") on 7/17/2017. > > Reviewed-by: Michael Kelley Applied to hyperv-next. Thanks. Wei.
Re: [PATCH] Drivers: hv: vmbus: Use after free in __vmbus_open()
On Tue, Apr 13, 2021 at 01:50:04PM +0300, Dan Carpenter wrote: > The "open_info" variable is added to the _connection.chn_msg_list, > but the error handling frees "open_info" without removing it from the > list. This will result in a use after free. First remove it from the > list, and then free it. > > Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues") > Signed-off-by: Dan Carpenter > --- > From static analysis. Untested etc. There is almost certainly a good > reason to add it to the list before checking "newchannel->rescind" but I > don't know the code well enough to know what the reason is. > AIUI the channel management code requires the message be queued before posting the message to backend, because processing response is done in another thread, and might happen before this message is added to the list if the order is reversed. > drivers/hv/channel.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c > index db30be8f9cce..1c5a418c1962 100644 > --- a/drivers/hv/channel.c > +++ b/drivers/hv/channel.c > @@ -653,7 +653,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel, > > if (newchannel->rescind) { > err = -ENODEV; > - goto error_free_info; > + goto error_clean_msglist; Looking at similar functions in the same file I think there is indeed an UAF problem in the original code. I have not studied this piece of code extensively so I will wait for others to chime in. Wei. > } > > err = vmbus_post_msg(open_msg, > -- > 2.30.2 >
Re: [PATCH] Drivers: hv: vmbus: remove unused including
On Tue, Apr 13, 2021 at 05:49:18PM +0800, Yang Li wrote: > Fix the following versioncheck warning: > ./drivers/hv/hv.c: 16 linux/version.h not needed. > > Reported-by: Abaci Robot > Signed-off-by: Yang Li Thanks for the patch. This has also been reported by Huawei's kernel bot and fixed in hyperv-next. Wei.
Re: [PATCH v2 4/4] KVM: hyper-v: Advertise support for fast XMM hypercalls
On Mon, Apr 12, 2021 at 07:00:17PM +0200, Siddharth Chandrasekaran wrote: > Now that all extant hypercalls that can use XMM registers (based on > spec) for input/outputs are patched to support them, we can start > advertising this feature to guests. > > Cc: Alexander Graf > Cc: Evgeny Iakovlev > Signed-off-by: Siddharth Chandrasekaran > --- > arch/x86/include/asm/hyperv-tlfs.h | 7 ++- > arch/x86/kvm/hyperv.c | 2 ++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/hyperv-tlfs.h > b/arch/x86/include/asm/hyperv-tlfs.h > index e6cd3fee562b..716f12be411e 100644 > --- a/arch/x86/include/asm/hyperv-tlfs.h > +++ b/arch/x86/include/asm/hyperv-tlfs.h > @@ -52,7 +52,7 @@ > * Support for passing hypercall input parameter block via XMM > * registers is available > */ > -#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) > +#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4) > /* Support for a virtual guest idle state is available */ > #define HV_X64_GUEST_IDLE_STATE_AVAILABLEBIT(5) > /* Frequency MSRs available */ > @@ -61,6 +61,11 @@ > #define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE BIT(10) > /* Support for debug MSRs available */ > #define HV_FEATURE_DEBUG_MSRS_AVAILABLE BIT(11) > +/* > + * Support for returning hypercall ouput block via XMM "output" Wei.
Re: [PATCH v2 3/4] KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers
On Mon, Apr 12, 2021 at 07:00:16PM +0200, Siddharth Chandrasekaran wrote: > + > +static inline void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc) Do you really need inline here? The compiler should be smart enough to inline this function if necessary. > +{ > + int reg; > + > + kvm_fpu_get(); > + for (reg = 0; reg < KVM_HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) > + _kvm_read_sse_reg(reg, >xmm[reg]); > + kvm_fpu_put(); > + hc->xmm_dirty = false; There is no code that sets xmm_dirty to true? What am I missing? I guess that's because you haven't implemented the hypercalls that need writing back to guest? Wei.
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Thu, Apr 08, 2021 at 05:54:43PM +0200, Siddharth Chandrasekaran wrote: > On Thu, Apr 08, 2021 at 05:48:19PM +0200, Paolo Bonzini wrote: > > On 08/04/21 17:40, Siddharth Chandrasekaran wrote: > > > > > > Although the Hyper-v TLFS mentions that a guest cannot use this > > > > > > feature > > > > > > unless the hypervisor advertises support for it, some hypercalls > > > > > > which > > > > > > we plan on upstreaming in future uses them anyway. > > > > > No, please don't do this. Check the feature bit(s) before you issue > > > > > hypercalls which rely on the extended interface. > > > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being > > > > buggy and using XMM arguments unconditionally. > > > The guest is at fault here as it expects Hyper-V to consume arguments > > > from XMM registers for certain hypercalls (that we are working) even if > > > we didn't expose the feature via CPUID bits. > > > > What guest is that? > > It is a Windows Server 2016. Can you be more specific? Are you implementing some hypercalls from TLFS? If so, which ones? Wei.
Re: [PATCH 4/4] KVM: hyper-v: Advertise support for fast XMM hypercalls
On Thu, Apr 08, 2021 at 04:20:54PM +0200, Siddharth Chandrasekaran wrote: > On Thu, Apr 08, 2021 at 02:05:53PM +0200, Vitaly Kuznetsov wrote: > > Siddharth Chandrasekaran writes: > > > > > Now that all extant hypercalls that can use XMM registers (based on > > > spec) for input/outputs are patched to support them, we can start > > > advertising this feature to guests. > > > > > > Cc: Alexander Graf > > > Cc: Evgeny Iakovlev > > > Signed-off-by: Siddharth Chandrasekaran > > > --- > > > arch/x86/include/asm/hyperv-tlfs.h | 4 ++-- > > > arch/x86/kvm/hyperv.c | 1 + > > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/hyperv-tlfs.h > > > b/arch/x86/include/asm/hyperv-tlfs.h > > > index e6cd3fee562b..1f160ef60509 100644 > > > --- a/arch/x86/include/asm/hyperv-tlfs.h > > > +++ b/arch/x86/include/asm/hyperv-tlfs.h > > > @@ -49,10 +49,10 @@ > > > /* Support for physical CPU dynamic partitioning events is available*/ > > > #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLEBIT(3) > > > /* > > > - * Support for passing hypercall input parameter block via XMM > > > + * Support for passing hypercall input and output parameter block via XMM > > > * registers is available > > > */ > > > -#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) > > > +#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) | > > > BIT(15) > > > > TLFS 6.0b states that there are two distinct bits for input and output: > > > > CPUID Leaf 0x4003.EDX: > > Bit 4: support for passing hypercall input via XMM registers is available. > > Bit 15: support for returning hypercall output via XMM registers is > > available. > > > > and HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE is not currently used > > anywhere, I'd suggest we just rename > > > > HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE to > > HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE > > and add HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (bit 15). > > That is how I had it initially; but then noticed that we would never > need to use either of them separately. So it seemed like a reasonable > abstraction to put them together. > They are two separate things in TLFS. Please use two macros here. Wei.
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Thu, Apr 08, 2021 at 05:30:26PM +0200, Paolo Bonzini wrote: > On 08/04/21 17:28, Wei Liu wrote: > > > Although the Hyper-v TLFS mentions that a guest cannot use this feature > > > unless the hypervisor advertises support for it, some hypercalls which > > > we plan on upstreaming in future uses them anyway. > > > > No, please don't do this. Check the feature bit(s) before you issue > > hypercalls which rely on the extended interface. > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being buggy > and using XMM arguments unconditionally. > There is no code in upstream Linux that uses the XMM fast hypercall interface at the moment. If there is such code, it has bugs in it and should be fixed. :-) Wei. > Paolo >
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Wed, Apr 07, 2021 at 11:29:26PM +0200, Siddharth Chandrasekaran wrote: > Hyper-V supports the use of XMM registers to perform fast hypercalls. > This allows guests to take advantage of the improved performance of the > fast hypercall interface even though a hypercall may require more than > (the current maximum of) two general purpose registers. > > The XMM fast hypercall interface uses an additional six XMM registers > (XMM0 to XMM5) to allow the caller to pass an input parameter block of > up to 112 bytes. Hyper-V can also return data back to the guest in the > remaining XMM registers that are not used by the current hypercall. > > Although the Hyper-v TLFS mentions that a guest cannot use this feature > unless the hypervisor advertises support for it, some hypercalls which > we plan on upstreaming in future uses them anyway. No, please don't do this. Check the feature bit(s) before you issue hypercalls which rely on the extended interface. Wei.
Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)
On Wed, Apr 07, 2021 at 02:34:01PM +, Haiyang Zhang wrote: > > > > -Original Message- > > From: Wei Liu > > Sent: Wednesday, April 7, 2021 9:17 AM > > To: Dexuan Cui > > Cc: da...@davemloft.net; k...@kernel.org; KY Srinivasan > > ; Haiyang Zhang ; Stephen > > Hemminger ; wei@kernel.org; Wei Liu > > ; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; linux-hyp...@vger.kernel.org > > Subject: Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure > > Network Adapter (MANA) > > > > On Tue, Apr 06, 2021 at 04:23:21PM -0700, Dexuan Cui wrote: > > [...] > > > +config MICROSOFT_MANA > > > + tristate "Microsoft Azure Network Adapter (MANA) support" > > > + default m > > > + depends on PCI_MSI > > > + select PCI_HYPERV > > > > OOI which part of the code requires PCI_HYPERV? > > > > Asking because I can't immediately find code that looks to be Hyper-V > > specific (searching for vmbus etc). This device looks like any other PCI > > devices > > to me. > > It depends on the VF nic's PCI config space which is presented by the > pci_hyperv driver. I think all it matters is the PCI bus is able to handle the configuration space access, right? Assuming there is an emulated PCI root complex which exposes the config space to the driver, will this driver still work? I'm trying to understand how tightly coupled with Hyper-V PCI this driver is. In an alternative universe, Microsft may suddenly decide to sell this hardware and someone wants to passthrough an VF via VFIO. I don't see how this driver wouldn't work, hence the original question. There is no need to change the code. I'm just curious about a tiny detail in the implementation. Wei. > > Thanks, > - Haiyang
Re: [RFC PATCH 04/18] virt/mshv: request version ioctl
On Wed, Apr 07, 2021 at 04:02:56PM +0200, Vitaly Kuznetsov wrote: > Wei Liu writes: > > > On Wed, Apr 07, 2021 at 09:38:21AM +0200, Vitaly Kuznetsov wrote: > > > >> One more though: it is probably a good idea to introduce selftests for > >> /dev/mshv (similar to KVM's selftests in > >> /tools/testing/selftests/kvm). Selftests don't really need a stable ABI > >> as they live in the same linux.git and can be updated in the same patch > >> series which changes /dev/mshv behavior. Selftests are very useful for > >> checking there are no regressions, especially in the situation when > >> there's no publicly available userspace for /dev/mshv. > > > > I think this can wait until we merge the first implementation in tree. > > There are still a lot of moving parts. Our (currently limited) internal > > test cases need more cleaning up before they are ready. I certainly > > don't want to distract Nuno from getting the foundation right. > > > > I'm absolutely fine with this approach, selftests are a nice add-on, not > a requirement for the initial implementation. Also, to make them more > useful to mere mortals, a doc on how to run Linux as root Hyper-V > partition would come handy) Making this system easier for others to use and consume is on our radar. Currently you need Windows bootloader and a not-yet-released loader to load the hypervisor. We're making progress in bringing in GRUB. Needless to say there are technical and non-technical challenges for this work, so don't expect it to happen very soon. :-) Wei. > > -- > Vitaly >
Re: [RFC PATCH 04/18] virt/mshv: request version ioctl
On Wed, Apr 07, 2021 at 09:38:21AM +0200, Vitaly Kuznetsov wrote: > Nuno Das Neves writes: > > > On 3/5/2021 1:18 AM, Vitaly Kuznetsov wrote: > >> Nuno Das Neves writes: > >> > >>> On 2/9/2021 5:11 AM, Vitaly Kuznetsov wrote: > Nuno Das Neves writes: > > >> ... > > + > > +3.1 MSHV_REQUEST_VERSION > > + > > +:Type: /dev/mshv ioctl > > +:Parameters: pointer to a u32 > > +:Returns: 0 on success > > + > > +Before issuing any other ioctls, a MSHV_REQUEST_VERSION ioctl must be > > called to > > +establish the interface version with the kernel module. > > + > > +The caller should pass the MSHV_VERSION as an argument. > > + > > +The kernel module will check which interface versions it supports and > > return 0 > > +if one of them matches. > > + > > +This /dev/mshv file descriptor will remain 'locked' to that version as > > long as > > +it is open - this ioctl can only be called once per open. > > + > > KVM used to have KVM_GET_API_VERSION too but this turned out to be not > very convenient so we use capabilities > (KVM_CHECK_EXTENSION/KVM_ENABLE_CAP) > instead. > > >>> > >>> The goal of MSHV_REQUEST_VERSION is to support changes to APIs in the > >>> core set. > >>> When we add new features/ioctls beyond the core we can use an > >>> extension/capability > >>> approach like KVM. > >>> > >> > >> Driver versions is a very bad idea from distribution/stable kernel point > >> of view as it presumes that the history is linear. It is not. > >> > >> Imagine you have the following history upstream: > >> > >> MSHV_REQUEST_VERSION = 1 > >> <100 commits with features/fixes> > >> MSHV_REQUEST_VERSION = 2 > >> > >> MSHV_REQUEST_VERSION = 2 > >> > >> Now I'm a linux distribution / stable kernel maintainer. My kernel is at > >> MSHV_REQUEST_VERSION = 1. Now I want to backport 1 feature from between > >> VER=1 and VER=2 and another feature from between VER=2 and VER=3. My > >> history now looks like > >> > >> MSHV_REQUEST_VERSION = 1 > >> <5 commits from between VER=1 and VER=2> > >>Which version should I declare here > >> <5 commits from between VER=2 and VER=3> > >>Which version should I declare here > >> > >> If I keep VER=1 then userspace will think that I don't have any extra > >> features added and just won't use them. If I change VER to 2/3, it'll > >> think I have *all* features from between these versions. > >> > >> The only reasonable way to manage this is to attach a "capability" to > >> every ABI change and expose this capability *in the same commit which > >> introduces the change to the ABI*. This way userspace will now exactly > >> which ioctls are available and what are their interfaces. > >> > >> Also, trying to define "core set" is hard but you don't really need > >> to. > >> > > > > We've had some internal discussion on this. > > > > There is bound to be some iteration before this ABI is stable, since even > > the > > underlying Microsoft hypervisor interfaces aren't stable just yet. > > > > It might make more sense to just have an IOCTL to check if the API is > > stable yet. > > This would be analogous to checking if kVM_GET_API_VERSION returns 12. > > > > How does this sound as a proposal? > > An MSHV_CHECK_EXTENSION ioctl to query extensions to the core /dev/mshv API. > > > > It takes a single argument, an integer named MSHV_CAP_* corresponding to > > the extension to check the existence of. > > > > The ioctl will return 0 if the extension is unsupported, or a positive > > integer > > if supported. > > > > We can initially include a capability called MSHV_CAP_CORE_API_STABLE. > > If supported, the core APIs are stable. > > This sounds reasonable, I'd suggest you reserve MSHV_CAP_CORE_API_STABLE > right away but don't expose it yet so it's clear the API is not yet > stable. Test userspace you have may always assume it's running with the > latest kernel. > > Also, please be clear about the fact that /dev/mshv doesn't > provide a stable API yet so nobody builds an application on top of > it. > Very good discussion and suggestions. Thank you Vitaly. > One more though: it is probably a good idea to introduce selftests for > /dev/mshv (similar to KVM's selftests in > /tools/testing/selftests/kvm). Selftests don't really need a stable ABI > as they live in the same linux.git and can be updated in the same patch > series which changes /dev/mshv behavior. Selftests are very useful for > checking there are no regressions, especially in the situation when > there's no publicly available userspace for /dev/mshv. I think this can wait until we merge the first implementation in tree. There are still a lot of moving parts. Our (currently limited) internal test cases need more cleaning up before they are ready. I certainly don't want to distract Nuno from getting the foundation right. Wei.
Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)
On Tue, Apr 06, 2021 at 04:23:21PM -0700, Dexuan Cui wrote: [...] > +config MICROSOFT_MANA > + tristate "Microsoft Azure Network Adapter (MANA) support" > + default m > + depends on PCI_MSI > + select PCI_HYPERV OOI which part of the code requires PCI_HYPERV? Asking because I can't immediately find code that looks to be Hyper-V specific (searching for vmbus etc). This device looks like any other PCI devices to me. Wei.
Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)
On Wed, Apr 07, 2021 at 08:08:59AM +, Dexuan Cui wrote: > > From: kernel test robot > > Sent: Tuesday, April 6, 2021 6:31 PM > > ... > > Hi Dexuan, > > I love your patch! Perhaps something to improve: > > > > All warnings (new ones prefixed by >>): > > > >drivers/pci/controller/pci-hyperv.c: In function 'hv_irq_unmask': > >drivers/pci/controller/pci-hyperv.c:1220:2: error: implicit declaration > > of > > function 'hv_set_msi_entry_from_desc' > > [-Werror=implicit-function-declaration] > > 1220 | hv_set_msi_entry_from_desc(>int_entry.msi_entry, > > msi_desc); > > This build error looks strange, because the patch doesn't even touch the > driver > drivers/pci/controller/pci-hyperv.c. > I think this is normal. The bot doesn't restrict itself to the changed code from my experience. Wei.
Re: [PATCH] xen/evtchn: Change irq_info lock to raw_spinlock_t
On Tue, Apr 06, 2021 at 11:51:04AM +0100, Luca Fancellu wrote: > Unmask operation must be called with interrupt disabled, > on preempt_rt spin_lock_irqsave/spin_unlock_irqrestore > don't disable/enable interrupts, so use raw_* implementation > and change lock variable in struct irq_info from spinlock_t > to raw_spinlock_t > > Cc: sta...@vger.kernel.org > Fixes: 25da4618af24 ("xen/events: don't unmask an event channel > when an eoi is pending") > > Signed-off-by: Luca Fancellu Reviewed-by: Wei Liu
Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers
On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote: > kernel.h is being used as a dump for all kinds of stuff for a long time. > Here is the attempt to start cleaning it up by splitting out panic and > oops helpers. > > At the same time convert users in header and lib folder to use new header. > Though for time being include new header back to kernel.h to avoid twisted > indirected includes for existing users. > > Signed-off-by: Andy Shevchenko Acked-by: Wei Liu
Re: [PATCH] x86/hyperv: remove unused including
On Tue, Apr 06, 2021 at 09:56:35AM +0800, Yang Li wrote: > Fix the following versioncheck warning: > ./arch/x86/hyperv/hv_proc.c: 3 linux/version.h not needed. > > Reported-by: Abaci Robot > Signed-off-by: Yang Li Thanks for the patch. This is already reported and fixed in hyperv-next by Huawei's Hulk bot. Wei. > --- > arch/x86/hyperv/hv_proc.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c > index 60461e5..27e17ad 100644 > --- a/arch/x86/hyperv/hv_proc.c > +++ b/arch/x86/hyperv/hv_proc.c > @@ -1,6 +1,5 @@ > // SPDX-License-Identifier: GPL-2.0 > #include > -#include > #include > #include > #include > -- > 1.8.3.1 >
[GIT PULL] Hyper-V fixes for 5.12-rc6
Hi Linus, The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8: Linux 5.12-rc1 (2021-02-28 16:05:19 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed-20210402 for you to fetch changes up to 37df9f3fedb6aeaff5564145e8162aab912c9284: video: hyperv_fb: Fix a double free in hvfb_probe (2021-03-25 13:31:20 +) hyperv-fixes for 5.12-rc6 - One patch from Lu Yunlong to fix a double free in hvfb_probe Lv Yunlong (1): video: hyperv_fb: Fix a double free in hvfb_probe drivers/video/fbdev/hyperv_fb.c | 3 --- 1 file changed, 3 deletions(-)
Re: [PATCH v2] video: hyperv_fb: Fix a double free in hvfb_probe
On Wed, Mar 24, 2021 at 01:46:39PM +, Michael Kelley wrote: > From: Lv Yunlong Sent: Wednesday, March 24, 2021 > 3:37 AM > > > > In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info) > > and return err when info->apertures is freed. > > > > In the error1 label of hvfb_probe, info->apertures will be freed for the > > second time in framebuffer_release(info). > > > > My patch removes all kfree(info->apertures) instead of set info->apertures > > to NULL. It is because that let framebuffer_release() handle freeing the > > memory flows the fbdev pattern, and less code overall. > > Let me suggest some clarifications in the commit message. It's probably > better not to reference the initial approach of setting info->apertures to > NULL, since there won't be any record of that approach in the commit > history. Here's what I would suggest: > > Function hvfb_probe() calls hvfb_getmem(), expecting upon return that > info->apertures is either NULL or points to memory that should be freed > by framebuffer_release(). But hvfb_getmem() is freeing the memory and > leaving the pointer non-NULL, resulting in a double free if an error > occurs or later if hvfb_remove() is called. > > Fix this by removing all kfree(info->apertures) calls in hvfb_getmem(). > This will allow framebuffer_release() to free the memory, which follows > the pattern of other fbdev drivers. > > Modulo this revision to the commit message, which Wei Liu can > probably incorporate, > Yes. I surely can incorporate the changes. I will also add the Fixes tag. > Reviewed-by: Michael Kelley
Re: [PATCH v5] x86/Hyper-V: Support for free page reporting
On Tue, Mar 23, 2021 at 06:47:16PM +, Sunil Muthuswamy wrote: > Linux has support for free page reporting now (36e66c554b5c) for > virtualized environment. On Hyper-V when virtually backed VMs are > configured, Hyper-V will advertise cold memory discard capability, > when supported. This patch adds the support to hook into the free > page reporting infrastructure and leverage the Hyper-V cold memory > discard hint hypercall to report/free these pages back to the host. > > Signed-off-by: Sunil Muthuswamy > Tested-by: Matheus Castello Applied to hyperv-next. Thanks.
Re: [PATCH -next] x86: Fix unused variable 'hi'
On Tue, Mar 23, 2021 at 11:32:50AM +, Wei Liu wrote: > On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote: > > Fixes the following W=1 kernel build warning(s): > > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used > > [-Wunused-but-set-variable] > > > > Compiled with CONFIG_HYPERV enabled: > > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 > > CROSS_COMPILE=x86_64-linux-gnu- > > > > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, > > which means higher > > 32 bits are not really used, therefore potentially cast to void in order > > to silent this warning. > > > > Reported-by: Hulk Robot > > Signed-off-by: Xu Yihang > > --- > > arch/x86/hyperv/hv_apic.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > > index 284e73661a18..a8b639498033 100644 > > --- a/arch/x86/hyperv/hv_apic.c > > +++ b/arch/x86/hyperv/hv_apic.c > > @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg) > > switch (reg) { > > case APIC_EOI: > > rdmsr(HV_X64_MSR_EOI, reg_val, hi); > > + (void) hi; > > return reg_val; > > case APIC_TASKPRI: > > rdmsr(HV_X64_MSR_TPR, reg_val, hi); > > + (void) hi; > > I would like to remove the space while committing this patch. There is > no need for you to do anything. Applied to hyperv-next.
Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning
On Tue, Mar 23, 2021 at 10:43:02AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not > used [-Wunused-but-set-variable] > unsigned long msr_val; > > As Hypervisor Top-Level Functional Specification states in chapter 7.5 > Virtual Processor Idle Sleep State, "A partition which possesses the > AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into > the virtual processor idle sleep state through a read to the > hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is > necessary, msr_val is not uesed, so potentially cast to void in order to > silent this warning. > > Reference: > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang Applied to hyperv-next. Thanks.
Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning
On Tue, Mar 23, 2021 at 01:13:03AM +0100, Ingo Molnar wrote: > > * Michael Kelley wrote: > > > From: Ingo Molnar Sent: Monday, March 22, 2021 > > 2:08 PM > > > > > > * Xu Yihang wrote: > > > > > > > Fixes the following W=1 kernel build warning(s): > > > > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable 'msr_val' set > > > > but not used [- > > > Wunused-but-set-variable] > > > > unsigned long msr_val; > > > > > > > > As Hypervisor Top-Level Functional Specification states in chapter 7.5 > > > > Virtual Processor > > > Idle Sleep State, "A partition which possesses the AccessGuestIdleMsr > > > privilege (refer to > > > section 4.2.2) may trigger entry into the virtual processor idle sleep > > > state through a read to > > > the hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read > > > is > > > necessary, msr_val is not uesed, so __maybe_unused should be added. > > > > > > > > Reference: > > > > > > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > > > > > > > Reported-by: Hulk Robot > > > > Signed-off-by: Xu Yihang > > > > --- > > > > arch/x86/hyperv/hv_spinlock.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/arch/x86/hyperv/hv_spinlock.c > > > > b/arch/x86/hyperv/hv_spinlock.c > > > > index f3270c1fc48c..67bc15c7752a 100644 > > > > --- a/arch/x86/hyperv/hv_spinlock.c > > > > +++ b/arch/x86/hyperv/hv_spinlock.c > > > > @@ -25,7 +25,7 @@ static void hv_qlock_kick(int cpu) > > > > > > > > static void hv_qlock_wait(u8 *byte, u8 val) > > > > { > > > > - unsigned long msr_val; > > > > + unsigned long msr_val __maybe_unused; > > > > unsigned long flags; > > > > > > Please don't add new __maybe_unused annotations to the x86 tree - > > > improve the flow instead to help GCC recognize the initialization > > > sequence better. > > > > > > Thanks, > > > > > > Ingo > > > > Could you elaborate on the thinking here, or point to some written > > discussion? I'm just curious. In this particular case, it's not a > > problem > > with the flow or gcc detection. This code really does read an MSR and > > ignore that value that is read, so it's not clear how gcc would ever > > figure out that's OK. > > Yeah, so the canonical way to signal that the msr_val isn't used would > be to rewrite this as: > > > if (READ_ONCE(*byte) == val) { > unsigned long msr_val; > > rdmsrl(HV_X64_MSR_GUEST_IDLE, msr_val); > > (void)msr_val; > } > > (Also see the patch below - untested.) > > This makes it abundantly clear that the rdmsr() msr_val return value > is not 'maybe' unused, but totally intentionally skipped. > > Thanks, > > Ingo > Thank you for the advice, Ingo. Wei.
Re: [PATCH -next] x86: Fix unused variable 'hi'
On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used > [-Wunused-but-set-variable] > > Compiled with CONFIG_HYPERV enabled: > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, which > means higher > 32 bits are not really used, therefore potentially cast to void in order to > silent this warning. > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang > --- > arch/x86/hyperv/hv_apic.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > index 284e73661a18..a8b639498033 100644 > --- a/arch/x86/hyperv/hv_apic.c > +++ b/arch/x86/hyperv/hv_apic.c > @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg) > switch (reg) { > case APIC_EOI: > rdmsr(HV_X64_MSR_EOI, reg_val, hi); > + (void) hi; > return reg_val; > case APIC_TASKPRI: > rdmsr(HV_X64_MSR_TPR, reg_val, hi); > + (void) hi; I would like to remove the space while committing this patch. There is no need for you to do anything. Wei. > return reg_val; > > default: > -- > 2.17.1 >
Re: [PATCH] video/fbdev: Fix a double free in hvfb_probe
Thanks for your patch. I would like to change the prefix to "video: hyperv_fb:" to be more specific. On Tue, Mar 23, 2021 at 12:33:50AM -0700, Lv Yunlong wrote: > In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info) > and return err when info->apertures is freed. > > In the error1 label of hvfb_probe, info->apertures will be freed twice > by framebuffer_release(info). > I would say "freed for the second time" here. What you wrote reads to me fraembuffer_release frees the buffer twice all by itself. > My patch sets info->apertures to NULL after it was freed to avoid > double free. > I think this approach works. I would like to give other people a chance to comment though. Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.") > Signed-off-by: Lv Yunlong > --- > drivers/video/fbdev/hyperv_fb.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c > index c8b0ae676809..2fc9b507e73a 100644 > --- a/drivers/video/fbdev/hyperv_fb.c > +++ b/drivers/video/fbdev/hyperv_fb.c > @@ -1032,6 +1032,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > if (!pdev) { > pr_err("Unable to find PCI Hyper-V video\n"); > kfree(info->apertures); > + info->apertures = NULL; > return -ENODEV; > } > > @@ -1130,6 +1131,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > pci_dev_put(pdev); > } > kfree(info->apertures); > + info->apertures = NULL; > > return 0; > > @@ -1142,6 +1144,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > if (!gen2vm) > pci_dev_put(pdev); > kfree(info->apertures); > + info->apertures = NULL; > > return -ENOMEM; > } > -- > 2.25.1 > >
Re: [PATCH v4] x86/Hyper-V: Support for free page reporting
On Fri, Mar 19, 2021 at 09:30:50PM +, Michael Kelley wrote: > From: Sunil Muthuswamy Sent: Friday, March 19, 2021 > 2:21 PM > > > > > What's the strategy for this flag in the unlikely event that the > > > hypercall fails? > > > It doesn't seem right to have hv_query_ext_cap() fail, but leave the > > > static flag set to true. Just move that line down to after the status > > > check > > > has succeeded? > > > > That call should not fail in any normal circumstances. The current idea was > > to > > avoid repeating the same call on persistent failure. > > OK, I can see that as a valid strategy. And the assumption is that a failed > hypercall would leave hv_extended_cap unmodified and hence all zeros. > > I'm OK with this approach if you want to keep it. But perhaps add a short > comment about the intent so it doesn't look like a bug. :-) > Sunil, if you can send an updated version of your patch by either providing a comment or moving the code around, I can queue it up for hyperv-next. I think adding a comment is perhaps the easier thing to do. Wei.
Re: [PATCH] hyperv: Few mundane typo fixes
On Sun, Mar 21, 2021 at 04:41:53PM -0700, Randy Dunlap wrote: > On March 21, 2021 4:31:08 PM PDT, Bhaskar Chowdhury > wrote: > > > >s/sructure/structure/ > >s/extention/extension/ > >s/offerred/offered/ > >s/adversley/adversely/ > > > >Signed-off-by: Bhaskar Chowdhury > > Acked-by: Randy Dunlap > Queued for hyperv-next. Thanks. Wei.
Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning
On Mon, Mar 22, 2021 at 11:17:13AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not > used [-Wunused-but-set-variable] > unsigned long msr_val; > > As Hypervisor Top-Level Functional Specification states in chapter 7.5 > Virtual Processor Idle Sleep State, "A partition which possesses the > AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into > the virtual processor idle sleep state through a read to the > hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is > necessary, msr_val is not uesed, so __maybe_unused should be added. > > Reference: > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang I modified the commit message a bit and queued this up for hyperv-next. Thanks. Wei.
Re: [PATCH -next] x86: Fix unused variable 'hi'
On Mon, Mar 22, 2021 at 11:54:26AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used > [-Wunused-but-set-variable] > > Compiled with CONFIG_HYPERV enabled: > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, which > means higher 32 bits are not really used, therefore __maybe_unused added. > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang I slightly modified the commit message and queued it up for hyperv-next. Thanks. Wei.
Re: [PATCH v2] drivers: hv: Fix EXPORT_SYMBOL and tab spaces issue
On Wed, Mar 10, 2021 at 10:51:55AM +0530, Vasanth wrote: > 1.Fixed EXPORT_SYMBOL should be follow immediately function/variable. > 2.Fixed code tab spaces issue. > > Signed-off-by: Vasanth M Applied to hyperv-next.
Re: linux-next: manual merge of the hyperv tree with the tip tree
On Tue, Mar 16, 2021 at 04:02:54PM +0100, Borislav Petkov wrote: > On Mon, Mar 15, 2021 at 02:35:05PM +1100, Stephen Rothwell wrote: > > Hi all, > > > > Today's linux-next merge of the hyperv tree got a conflict in: > > > > arch/x86/include/asm/mshyperv.h > > > > between commit: > > > > a0e2bf7cb700 ("x86/paravirt: Switch time pvops functions to use > > static_call()") > > > > from the tip tree and commit: > > > > eb3e1d370b4c ("clocksource/drivers/hyper-v: Handle sched_clock > > differences inline") > > > > from the hyperv tree. > > > > I fixed it up (I used the latter version of this file and then applied the > > following patch) and can carry the fix as necessary. This is now fixed > > as far as linux-next is concerned, but any non trivial conflicts should > > be mentioned to your upstream maintainer when your tree is submitted for > > merging. You may also want to consider cooperating with the maintainer > > of the conflicting tree to minimise any particularly complex conflicts. > > Right, > > so tglx and I took a quick look and came to the conclusion that it would > be best if you - provided it is not too much trouble - keep applying > this patch so that linux-next can get tested properly and we - Wei or I > - explain this merge conflict in our pull requests during the next merge > window and ask Linus to merge your patch ontop. This way we'll save us > the cross-tree merging dance. Totally agreed. :-) I've made a note to inform Linus about this in the next merge window. Thanks, Wei. > > Thx! > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG > Nürnberg
Re: [PATCH 1/1] asm-generic/hyperv: Add missing function prototypes per -W1 warnings
On Wed, Mar 10, 2021 at 10:47:49AM -0800, Michael Kelley wrote: > Add two function prototypes for -W1 warnings generated by the > kernel test robot. > > Reported-by: kernel test robot > Signed-off-by: Michael Kelley Applied to hyperv-next
Re: [patch 12/14] PCI: hv: Use tasklet_disable_in_atomic()
On Tue, Mar 09, 2021 at 09:42:15AM +0100, Thomas Gleixner wrote: > From: Sebastian Andrzej Siewior > > The hv_compose_msi_msg() callback in irq_chip::irq_compose_msi_msg is > invoked via irq_chip_compose_msi_msg(), which itself is always invoked from > atomic contexts from the guts of the interrupt core code. > > There is no way to change this w/o rewriting the whole driver, so use > tasklet_disable_in_atomic() which allows to make tasklet_disable() > sleepable once the remaining atomic users are addressed. > > Signed-off-by: Sebastian Andrzej Siewior > Signed-off-by: Thomas Gleixner Acked-by: Wei Liu
Re: [PATCH v3 00/10] Refactor arch specific Hyper-V code
On Tue, Mar 02, 2021 at 01:38:12PM -0800, Michael Kelley wrote: [...] > Michael Kelley (10): > Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code > x86/hyper-v: Move hv_message_type to architecture neutral module > Drivers: hv: Redo Hyper-V synthetic MSR get/set functions > Drivers: hv: vmbus: Move hyperv_report_panic_msg to arch neutral code > Drivers: hv: vmbus: Handle auto EOI quirk inline > Drivers: hv: vmbus: Move handling of VMbus interrupts > clocksource/drivers/hyper-v: Handle vDSO differences inline > clocksource/drivers/hyper-v: Handle sched_clock differences inline > clocksource/drivers/hyper-v: Set clocksource rating based on Hyper-V > feature > clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts Applied to hyperv-next. Wei.
Re: [PATCH] drivers: hv: Fix no spaces issue
On Fri, Mar 05, 2021 at 09:34:49PM +0530, Vasanth wrote: > Fixed code spaces issue. > > Signed-off-by: Vasanth M What do you mean by "Fixed no spaces issue" in the title? I can see you deleted an empty line and changed some tabs to spaces. Please be specific in the commit message. Wei.
Re: [PATCH] Drivers: hv: vmbus: Drop error message when 'No request id available'
On Mon, Mar 01, 2021 at 08:13:48PM +0100, Andrea Parri (Microsoft) wrote: > Running out of request IDs on a channel essentially produces the same > effect as running out of space in the ring buffer, in that -EAGAIN is > returned. The error message in hv_ringbuffer_write() should either be > dropped (since we don't output a message when the ring buffer is full) > or be made conditional/debug-only. > > Suggested-by: Michael Kelley > Signed-off-by: Andrea Parri (Microsoft) > Fixes: e8b7db38449ac ("Drivers: hv: vmbus: Add vmbus_requestor data structure > for VMBus hardening") Applied to hyperv-fixes. Wei.
Re: [PATCH v3] drivers: hv: Fix whitespace errors
On Fri, Feb 19, 2021 at 05:30:36PM +, Michael Kelley wrote: > From: Vasanth Sent: Friday, February 19, 2021 9:13 AM > > To: KY Srinivasan > > Cc: Haiyang Zhang ; Stephen Hemminger > > ; wei@kernel.org; linux-hyp...@vger.kernel.org; > > linux- > > ker...@vger.kernel.org; Vasanth > > Subject: [PATCH v3] drivers: hv: Fix whitespace errors > > > > Fixed checkpatch warning and errors on hv driver. > > > > Signed-off-by: Vasanth Vasanth, normally people put their full name in the SoB. Do you want to do that too? There is no need to resend. Just let me know what you think. > > --- > > Reviewed-by: Michael Kelley Thanks Michael. I will pick up this patch within this week. Wei.
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/28/2021 1:27 PM, Michael S. Tsirkin wrote: On Thu, Feb 25, 2021 at 04:56:42PM -0800, Si-Wei Liu wrote: Hi Michael, Are you okay to live without this ioctl for now? I think QEMU is the one that needs to be fixed and will have to be made legacy guest aware. I think the kernel can just honor the feature negotiation result done by QEMU and do as what's told to.Will you agree? If it's fine, I would proceed to reverting commit fe36cbe067 and related code in question from the kernel. Thanks, -Siwei Not really, I don't see why that's a good idea. fe36cbe067 is the code checking MTU before FEATURES_OK. Spec explicitly allows that. Alright, but what I meant was this commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy"). But I got why you need it in another email (for BE host/guest). -Siwei
Re: [PATCH v2 00/10] Refactor arch specific Hyper-V code
On Sun, Feb 28, 2021 at 05:15:22PM -0800, Michael Kelley wrote: > To support Linux guests on Hyper-V on multiple architectures, the original > approach factored out all differences between Hyper-V on x86/x64 and > Hyper-V on ARM64 into functions or #defines under arch/x86 and > arch/arm64. Some of these differences are truly related to the > architecture, but others are more properly treated as Linux OS > differences or just quirks in Hyper-V. Feedback from Arnd Bergmann[1] > recommended that differences other than architecture should be > incorporated into the architecture independent Hyper-V code. Each > difference can be handled with conditions specific to the difference > instead of tying it to the broader x86/x64 vs. ARM64. This approach > reduces the amount of code under arch/x86 and arch/arm64 and keeps > the non-architectural differences localized and more easily understood. > > This patch set implements the new approach by changing the interface > between the architecture independent code and the architecture dependent > code for x86/x64. The patches move code from arch/x86 to the > architecture independent Hyper-V code whenever possible, and add > architecture independent support needed by other architectures like > ARM64. No functionality is changed for x86/x64. A subsequent patch > set will provide the Hyper-V support code under arch/arm64. > > This patch set results in an increase in lines of code (though some > of the increase is additional comments). But the lines needed under > arch/arm64 in the upcoming patch set is significantly reduced, resulting > in a net decrease of about 125 lines. > > [1] > https://lore.kernel.org/lkml/CAK8P3a1hDBVembCd+6=ENUWYFz=72jbtfmrkyz2afd+_q04...@mail.gmail.com/ This series looks good to me. Given this series touches mostly Hyper-V code. I will be taking this it via hyperv-next once the last two patches are reviewed. Wei.
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
Hi Michael, Are you okay to live without this ioctl for now? I think QEMU is the one that needs to be fixed and will have to be made legacy guest aware. I think the kernel can just honor the feature negotiation result done by QEMU and do as what's told to.Will you agree? If it's fine, I would proceed to reverting commit fe36cbe067 and related code in question from the kernel. Thanks, -Siwei On 2/24/2021 10:24 AM, Si-Wei Liu wrote: Detecting it isn't enough though, we will need a new ioctl to notify the kernel that it's a legacy guest. Ugh :( Well, although I think adding an ioctl is doable, may I know what the use case there will be for kernel to leverage such info directly? Is there a case QEMU can't do with dedicate ioctls later if there's indeed differentiation (legacy v.s. modern) needed? One of the reason I asked is if this ioctl becomes a mandate for vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't see this ioctl coming? If it's optional, suppose the kernel may need it only when it becomes necessary? Thanks, -Siwei
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote: On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote: On 2021/2/23 9:12 上午, Si-Wei Liu wrote: On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Thanks And also: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. so accessing config space before FEATURES_OK is a spec violation, right? It is, but it's not relevant to what this commit tries to address. I thought the legacy guest still needs to be supported. Having said, a separate patch has to be posted to fix the guest driver issue where this discrepancy is introduced to virtnet_validate() (since commit fe36cbe067). But it's not technically related to this patch. -Siwei I think it's a bug to read config space in validate, we should move it to virtnet_probe(). Thanks I take it back, reading but not writing seems to be explicitly allowed by spec. So our way to detect a legacy guest is bogus, need to think what is the best way to handle this. Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy guest? Supposedly only config space write access needs to be guarded before setting FEATURES_OK. -Siwie Detecting it isn't enough though, we will need a new ioctl to notify the kernel that it's a legacy guest. Ugh :( Well, although I think adding an ioctl is doable, may I know what the use case there will be for kernel to leverage such info directly? Is there a case QEMU can't do with dedicate ioctls later if there's indeed differentiation (legacy v.s. modern) needed? One of the reason I asked is if this ioctl becomes a mandate for vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't see this ioctl coming? If it's optional, suppose the kernel may need it only when it becomes necessary? Thanks, -Siwei Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mv
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote: On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote: On 2021/2/23 9:12 上午, Si-Wei Liu wrote: On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Thanks And also: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. so accessing config space before FEATURES_OK is a spec violation, right? It is, but it's not relevant to what this commit tries to address. I thought the legacy guest still needs to be supported. Having said, a separate patch has to be posted to fix the guest driver issue where this discrepancy is introduced to virtnet_validate() (since commit fe36cbe067). But it's not technically related to this patch. -Siwei I think it's a bug to read config space in validate, we should move it to virtnet_probe(). Thanks I take it back, reading but not writing seems to be explicitly allowed by spec. So our way to detect a legacy guest is bogus, need to think what is the best way to handle this. Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy guest? Supposedly only config space write access needs to be guarded before setting FEATURES_OK. -Siwie Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - int err; print_features(mvdev, features, true); - err = verify_min_features(mvdev, features); - if (err) - return err; - ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features; ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu); ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); - return err; + return 0; } static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote: On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Thanks And also: The driver MUST follow this sequence to initialize a device: 1. Reset the device. 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device. 3. Set the DRIVER status bit: the guest OS knows how to drive the device. 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it. 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step. 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not support our subset of features and the device is unusable. 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8. Set the DRIVER_OK status bit. At this point the device is “live”. so accessing config space before FEATURES_OK is a spec violation, right? It is, but it's not relevant to what this commit tries to address. I thought the legacy guest still needs to be supported. Having said, a separate patch has to be posted to fix the guest driver issue where this discrepancy is introduced to virtnet_validate() (since commit fe36cbe067). But it's not technically related to this patch. -Siwei Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - int err; print_features(mvdev, features, true); - err = verify_min_features(mvdev, features); - if (err) - return err; - ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features; ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu); ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); - return err; + return 0; } static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2/21/2021 8:14 PM, Jason Wang wrote: On 2021/2/19 7:54 下午, Si-Wei Liu wrote: Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. This looks like a spec violation: " The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to use. " Do we really want to workaround this? Isn't the commit 452639a64ad8 itself is a workaround for legacy guest? I think the point is, since there's legacy guest we'd have to support, this host side workaround is unavoidable. Although I agree the violating driver should be fixed (yes, it's in today's upstream kernel which exists for a while now). -Siwei Thanks Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - int err; print_features(mvdev, features, true); - err = verify_min_features(mvdev, features); - if (err) - return err; - ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features; ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu); ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); - return err; + return 0; } static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
Re: [PATCH v8 1/6] arm64: hyperv: Add Hyper-V hypercall and register access utilities
On Thu, Feb 18, 2021 at 03:16:29PM -0800, Michael Kelley wrote: > hyperv-tlfs.h defines Hyper-V interfaces from the Hyper-V Top Level > Functional Spec (TLFS), and #includes the architecture-independent > part of hyperv-tlfs.h in include/asm-generic. The published TLFS > is distinctly oriented to x86/x64, so the ARM64-specific > hyperv-tlfs.h includes information for ARM64 that is not yet formally > published. The TLFS is available here: > > docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > mshyperv.h defines Linux-specific structures and routines for > interacting with Hyper-V on ARM64, and #includes the architecture- > independent part of mshyperv.h in include/asm-generic. > > Use these definitions to provide utility functions to make > Hyper-V hypercalls and to get and set Hyper-V provided > registers associated with a virtual processor. > > Signed-off-by: Michael Kelley Reviewed-by: Wei Liu
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/19/2021 6:38 PM, Jason Wang wrote: Right now the value is exposed to userspace via GET_VRING_BASE, so only last_avail_idx is synced. If we need sync last_used_idx, we should also sync pending indices which requires more thoughts. Technically it doesn't sound right - crossing the boundary a bit even with simplified form of assumption. But depending on how userspace could make use of this API, it doesn't seem it breaks existing functionality for the moment. I don't get here, maybe you can explain a little bit more? Please refer to the email I just sent. https://lore.kernel.org/lkml/033b0806-4037-5755-a1fa-91dbb58ba...@oracle.com/ -Siwei
Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
On 2/17/2021 11:42 AM, Si-Wei Liu wrote: On 2/16/2021 8:20 AM, Eli Cohen wrote: When we suspend the VM, the VDPA interface will be reset. When the VM is resumed again, clear_virtqueues() will clear the available and used indices resulting in hardware virqtqueue objects becoming out of sync. We can avoid this function alltogether since qemu will clear them if required, e.g. when the VM went through a reboot. Moreover, since the hw available and used indices should always be identical on query and should be restored to the same value same value for virtqueues that complete in order, we set the single value provided by set_vq_state(). In get_vq_state() we return the value of hardware used index. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen Acked-by: Si-Wei Liu I'd take it back for the moment, according to Jason's latest comment. Discussion still going. --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8e9d525d66c..a51b0f86afe2 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m return; } mvq->avail_idx = attr.available_index; + mvq->used_idx = attr.used_index; } static void suspend_vqs(struct mlx5_vdpa_net *ndev) @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx, return -EINVAL; } + mvq->used_idx = state->avail_index; mvq->avail_idx = state->avail_index; This is where things starts getting interesting. According to Jason, the original expectation of this API would be to restore the internal last_avail_index in the hardware. With Mellanox network vDPA hardware implementation, there's no such last_avail_index thing in the hardware but only a single last_used_index representing both indices, which should always be the same and in sync with each other. So from migration point of view, it appears logical to restore the saved last_avail_index to the last_used_index in the hardware, right? But what is the point to restore mvq->avail_idx? Actually, this code path is being repurposed to address the index reset problem in the device reset scenario. Where the mvq->avail_idx and mvq->used_idx are both reset to 0 once device is reset. This is a bit crossing the boundary to me. return 0; } @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa * that cares about emulating the index after vq is stopped. */ if (!mvq->initialized) { - state->avail_index = mvq->avail_idx; + state->avail_index = mvq->used_idx; This is where the last_used_index gets loaded from the hardware (when device is stopped). return 0; } @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n"); return err; } - state->avail_index = attr.available_index; + state->avail_index = attr.used_index; This code path never gets called from userspace (when device is running). -Siwei return 0; } @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } -static void clear_virtqueues(struct mlx5_vdpa_net *ndev) -{ - int i; - - for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { - ndev->vqs[i].avail_idx = 0; - ndev->vqs[i].used_idx = 0; - } -} - /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation;
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/18/2021 7:10 PM, Jason Wang wrote: On 2021/2/18 8:43 下午, Si-Wei Liu wrote: On 2/17/2021 8:44 PM, Jason Wang wrote: On 2021/2/10 下午4:59, Si-Wei Liu wrote: On 2/9/2021 7:53 PM, Jason Wang wrote: On 2021/2/10 上午10:30, Si-Wei Liu wrote: On 2/8/2021 10:37 PM, Jason Wang wrote: On 2021/2/9 下午2:12, Eli Cohen wrote: On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote: On 2021/2/8 下午6:04, Eli Cohen wrote: On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: On 2021/2/8 下午2:37, Eli Cohen wrote: On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: On 2021/2/6 上午7:07, Si-Wei Liu wrote: On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_st
[PATCH] vdpa/mlx5: set_features should allow reset to zero
Commit 452639a64ad8 ("vdpa: make sure set_features is invoked for legacy") made an exception for legacy guests to reset features to 0, when config space is accessed before features are set. We should relieve the verify_min_features() check and allow features reset to 0 for this case. It's worth noting that not just legacy guests could access config space before features are set. For instance, when feature VIRTIO_NET_F_MTU is advertised some modern driver will try to access and validate the MTU present in the config space before virtio features are set. Rejecting reset to 0 prematurely causes correct MTU and link status unable to load for the very first config space access, rendering issues like guest showing inaccurate MTU value, or failure to reject out-of-range MTU. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..540dd67 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) return mvdev->mlx_features; } -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) -{ - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) - return -EOPNOTSUPP; - - return 0; -} - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) { int err; @@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - int err; print_features(mvdev, features, true); - err = verify_min_features(mvdev, features); - if (err) - return err; - ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features; ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu); ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); - return err; + return 0; } static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb) -- 1.8.3.1
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/17/2021 8:44 PM, Jason Wang wrote: On 2021/2/10 下午4:59, Si-Wei Liu wrote: On 2/9/2021 7:53 PM, Jason Wang wrote: On 2021/2/10 上午10:30, Si-Wei Liu wrote: On 2/8/2021 10:37 PM, Jason Wang wrote: On 2021/2/9 下午2:12, Eli Cohen wrote: On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote: On 2021/2/8 下午6:04, Eli Cohen wrote: On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: On 2021/2/8 下午2:37, Eli Cohen wrote: On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: On 2021/2/6 上午7:07, Si-Wei Liu wrote: On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_state() is supposed to be called right after to get sync'ed with the latest interna
Re: [PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK
On 2/16/2021 7:21 AM, Eli Cohen wrote: On Thu, Feb 11, 2021 at 09:33:14AM +0200, Eli Cohen wrote: On Wed, Feb 10, 2021 at 01:48:00PM -0800, Si-Wei Liu wrote: While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu Acked-by: Jason Wang Acked-by: Eli Cohen I take it back. I think we don't need to clear the indexes at all. In case we need to restore indexes we'll get the right values through set_vq_state(). If we suspend the virtqueue due to VM being suspended, qemu will query first and will provide the the queried value. In case of VM reboot, it will provide 0 in set_vq_state(). I am sending a patch that addresses both reboot and suspend. With set_vq_state() repurposed to restoring used_index I'm fine with this approach. Do I have to repost a v3 of this series while dropping the 3rd patch? -Siwei --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..ce6aae8 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); -- 1.8.3.1
Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
On 2/17/2021 1:20 PM, Michael S. Tsirkin wrote: On Wed, Feb 17, 2021 at 11:42:48AM -0800, Si-Wei Liu wrote: On 2/16/2021 8:20 AM, Eli Cohen wrote: When we suspend the VM, the VDPA interface will be reset. When the VM is resumed again, clear_virtqueues() will clear the available and used indices resulting in hardware virqtqueue objects becoming out of sync. We can avoid this function alltogether since qemu will clear them if required, e.g. when the VM went through a reboot. Moreover, since the hw available and used indices should always be identical on query and should be restored to the same value same value for virtqueues that complete in order, we set the single value provided by set_vq_state(). In get_vq_state() we return the value of hardware used index. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen Acked-by: Si-Wei Liu Seems to also fix b35ccebe3ef76168aa2edaa35809c0232cb3578e, right? I think so. It should have both "Fixes" tags. -Siwei --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8e9d525d66c..a51b0f86afe2 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m return; } mvq->avail_idx = attr.available_index; + mvq->used_idx = attr.used_index; } static void suspend_vqs(struct mlx5_vdpa_net *ndev) @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx, return -EINVAL; } + mvq->used_idx = state->avail_index; mvq->avail_idx = state->avail_index; return 0; } @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa * that cares about emulating the index after vq is stopped. */ if (!mvq->initialized) { - state->avail_index = mvq->avail_idx; + state->avail_index = mvq->used_idx; return 0; } @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n"); return err; } - state->avail_index = attr.available_index; + state->avail_index = attr.used_index; return 0; } @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } -static void clear_virtqueues(struct mlx5_vdpa_net *ndev) -{ - int i; - - for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { - ndev->vqs[i].avail_idx = 0; - ndev->vqs[i].used_idx = 0; - } -} - /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation;
Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
On 2/16/2021 8:20 AM, Eli Cohen wrote: When we suspend the VM, the VDPA interface will be reset. When the VM is resumed again, clear_virtqueues() will clear the available and used indices resulting in hardware virqtqueue objects becoming out of sync. We can avoid this function alltogether since qemu will clear them if required, e.g. when the VM went through a reboot. Moreover, since the hw available and used indices should always be identical on query and should be restored to the same value same value for virtqueues that complete in order, we set the single value provided by set_vq_state(). In get_vq_state() we return the value of hardware used index. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen Acked-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8e9d525d66c..a51b0f86afe2 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m return; } mvq->avail_idx = attr.available_index; + mvq->used_idx = attr.used_index; } static void suspend_vqs(struct mlx5_vdpa_net *ndev) @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx, return -EINVAL; } + mvq->used_idx = state->avail_index; mvq->avail_idx = state->avail_index; return 0; } @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa * that cares about emulating the index after vq is stopped. */ if (!mvq->initialized) { - state->avail_index = mvq->avail_idx; + state->avail_index = mvq->used_idx; return 0; } @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n"); return err; } - state->avail_index = attr.available_index; + state->avail_index = attr.used_index; return 0; } @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } -static void clear_virtqueues(struct mlx5_vdpa_net *ndev) -{ - int i; - - for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { - ndev->vqs[i].avail_idx = 0; - ndev->vqs[i].used_idx = 0; - } -} - /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation;
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/10/2021 7:45 AM, Eli Cohen wrote: On Wed, Feb 10, 2021 at 12:59:03AM -0800, Si-Wei Liu wrote: On 2/9/2021 7:53 PM, Jason Wang wrote: On 2021/2/10 上午10:30, Si-Wei Liu wrote: On 2/8/2021 10:37 PM, Jason Wang wrote: On 2021/2/9 下午2:12, Eli Cohen wrote: On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote: On 2021/2/8 下午6:04, Eli Cohen wrote: On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: On 2021/2/8 下午2:37, Eli Cohen wrote: On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: On 2021/2/6 上午7:07, Si-Wei Liu wrote: On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_state() is supposed to be called right after to get sync'ed with
[GIT PULL] Hyper-V commits for 5.12
Hi Linus, The following changes since commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04: Linux 5.11-rc5 (2021-01-24 16:47:14 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20210216 for you to fetch changes up to 3019270282a175defc02c8331786c73e082cd2a8: Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer" (2021-02-15 10:49:11 +) hyperv-next for 5.12 - VMBus hardening patches from Andrea Parri and Andres Beltran. - Patches to make Linux boot as the root partition on Microsoft Hypervisor from Wei Liu. - One patch to add a new sysfs interface to support hibernation on Hyper-V from Dexuan Cui. - Two miscellaneous clean-up patches from Colin and Gustavo. Andrea Parri (Microsoft) (9): Drivers: hv: vmbus: Initialize memory to be sent to the host Drivers: hv: vmbus: Reduce number of references to message in vmbus_on_msg_dpc() Drivers: hv: vmbus: Copy the hv_message in vmbus_on_msg_dpc() Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind() Drivers: hv: vmbus: Resolve race condition in vmbus_onoffer_rescind() x86/hyperv: Load/save the Isolation Configuration leaf Drivers: hv: vmbus: Restrict vmbus_devices on isolated guests Drivers: hv: vmbus: Enforce 'VMBus version >= 5.2' on isolated guests hv_netvsc: Restrict configurations on isolated guests Andres Beltran (2): Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer hv_utils: Add validation for untrusted Hyper-V values Colin Ian King (1): hv_utils: Fix spelling mistake "Hearbeat" -> "Heartbeat" Dexuan Cui (1): Drivers: hv: vmbus: Add /sys/bus/vmbus/hibernation Gustavo A. R. Silva (1): hv: hyperv.h: Replace one-element array with flexible-array in struct icmsg_negotiate Wei Liu (17): asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT x86/hyperv: detect if Linux is the root partition Drivers: hv: vmbus: skip VMBus initialization if Linux is root clocksource/hyperv: use MSR-based access if running as root x86/hyperv: allocate output arg pages if required x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary x86/hyperv: handling hypercall page setup for root ACPI / NUMA: add a stub function for node_to_pxm() x86/hyperv: provide a bunch of helper functions x86/hyperv: implement and use hv_smp_prepare_cpus asm-generic/hyperv: update hv_msi_entry asm-generic/hyperv: update hv_interrupt_entry asm-generic/hyperv: introduce hv_device_id and auxiliary structures asm-generic/hyperv: import data structures for mapping device interrupts x86/hyperv: implement an MSI domain for root partition iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer" Documentation/ABI/stable/sysfs-bus-vmbus | 7 + arch/x86/hyperv/Makefile | 4 +- arch/x86/hyperv/hv_init.c| 122 +- arch/x86/hyperv/hv_proc.c| 219 ++ arch/x86/hyperv/irqdomain.c | 385 +++ arch/x86/include/asm/hyperv-tlfs.h | 38 +++ arch/x86/include/asm/mshyperv.h | 19 +- arch/x86/kernel/cpu/mshyperv.c | 58 + drivers/clocksource/hyperv_timer.c | 3 + drivers/hv/channel.c | 4 +- drivers/hv/channel_mgmt.c| 77 ++- drivers/hv/connection.c | 7 + drivers/hv/hv_fcopy.c| 36 ++- drivers/hv/hv_kvp.c | 122 +- drivers/hv/hv_snapshot.c | 89 --- drivers/hv/hv_util.c | 222 +++--- drivers/hv/vmbus_drv.c | 64 +++-- drivers/iommu/hyperv-iommu.c | 177 +- drivers/net/hyperv/netvsc.c | 18 +- drivers/pci/controller/pci-hyperv.c | 2 +- include/acpi/acpi_numa.h | 4 + include/asm-generic/hyperv-tlfs.h| 255 +++- include/asm-generic/mshyperv.h | 5 + include/linux/hyperv.h | 13 +- 24 files changed, 1717 insertions(+), 233 deletions(-) create mode 100644 arch/x86/hyperv/hv_proc.c create mode 100644 arch/x86/hyperv/irqdomain.c
Re: Regressions with VMBus/VSCs hardening changes
On Fri, Feb 12, 2021 at 05:50:50PM +0100, Andrea Parri wrote: > Hi all, [...] > 2) IIUC a8c3209998afb5 could be dropped (after rebase) without further modi- >fications to hyperv-next. I've reverted the said patch from hyperv-next. Wei.
Re: [PATCH v2 4/8] xen/netback: fix spurious event detection for common event case
On Thu, Feb 11, 2021 at 11:16:12AM +0100, Juergen Gross wrote: > In case of a common event for rx and tx queue the event should be > regarded to be spurious if no rx and no tx requests are pending. > > Unfortunately the condition for testing that is wrong causing to > decide a event being spurious if no rx OR no tx requests are > pending. > > Fix that plus using local variables for rx/tx pending indicators in > order to split function calls and if condition. > > Fixes: 23025393dbeb3b ("xen/netback: use lateeoi irq binding") > Signed-off-by: Juergen Gross Reviewed-by: Wei Liu
[PATCH v2 1/3] vdpa/mlx5: should exclude header length and fcs from mtu
When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa, 22 extra bytes worth of MTU length is shown in guest. This is because the mlx5_query_port_max_mtu API returns the "hardware" MTU value, which does not just contain the Ethernet payload, but includes extra lengths starting from the Ethernet header up to the FCS altogether. Fix the MTU so packets won't get dropped silently. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu Acked-by: Jason Wang Acked-by: Eli Cohen --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 4 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 ++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 08f742f..b6cc53b 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -4,9 +4,13 @@ #ifndef __MLX5_VDPA_H__ #define __MLX5_VDPA_H__ +#include +#include #include #include +#define MLX5V_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN) + struct mlx5_vdpa_direct_mr { u64 start; u64 end; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index dc88559..b8416c4 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1907,6 +1907,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 idx) .free = mlx5_vdpa_free, }; +static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu) +{ + u16 hw_mtu; + int err; + + err = mlx5_query_nic_vport_mtu(mdev, _mtu); + if (err) + return err; + + *mtu = hw_mtu - MLX5V_ETH_HARD_MTU; + return 0; +} + static int alloc_resources(struct mlx5_vdpa_net *ndev) { struct mlx5_vdpa_net_resources *res = >res; @@ -1992,7 +2005,7 @@ static int mlx5v_probe(struct auxiliary_device *adev, init_mvqs(ndev); mutex_init(>reslock); config = >config; - err = mlx5_query_nic_vport_mtu(mdev, >mtu); + err = query_mtu(mdev, >mtu); if (err) goto err_mtu; -- 1.8.3.1
[PATCH v2 0/3] mlx5_vdpa bug fixes
This set attempts to fix a few independent issues recently found in mlx5_vdpa driver. Please find details for each in the commit message. Patch 1 and patch 3 are already Ack'ed by Jason Wang. Patch 2 is reworked to move virtio feature capability query to mlx5v_probe() as suggested by Eli. -- v1->v2: move feature capability query to probing (Eli) Si-Wei Liu (3): vdpa/mlx5: should exclude header length and fcs from mtu vdpa/mlx5: fix feature negotiation across device reset vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK drivers/vdpa/mlx5/core/mlx5_vdpa.h | 4 drivers/vdpa/mlx5/net/mlx5_vnet.c | 42 +++--- 2 files changed, 34 insertions(+), 12 deletions(-) -- 1.8.3.1
[PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK
While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu Acked-by: Jason Wang --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 7c1f789..ce6aae8 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); -- 1.8.3.1
[PATCH v2 2/3] vdpa/mlx5: fix feature negotiation across device reset
The mlx_features denotes the capability for which set of virtio features is supported by device. In principle, this field needs not be cleared during virtio device reset, as this capability is static and does not change across reset. In fact, the current code may have the assumption that mlx_features can be reloaded from firmware via the .get_features ops after device is reset (via the .set_status ops), which is unfortunately not true. The userspace VMM might save a copy of backend capable features and won't call into kernel again to get it on reset. This causes all virtio features getting disabled on newly created virtqs after device reset, while guest would hold mismatched view of available features. For e.g., the guest may still assume tx checksum offload is available after reset and feature negotiation, causing frames with bogus (incomplete) checksum transmitted on the wire. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8416c4..7c1f789 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1486,16 +1486,8 @@ static u64 mlx_to_vritio_features(u16 dev_features) static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); - struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - u16 dev_features; - dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask); - ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features); - if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0)) - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1); - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM); - print_features(mvdev, ndev->mvdev.mlx_features, false); - return ndev->mvdev.mlx_features; + return mvdev->mlx_features; } static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features) @@ -1788,7 +1780,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; - ndev->mvdev.mlx_features = 0; ++mvdev->generation; return; } @@ -1907,6 +1898,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 idx) .free = mlx5_vdpa_free, }; +static void query_virtio_features(struct mlx5_vdpa_net *ndev) +{ + struct mlx5_vdpa_dev *mvdev = >mvdev; + u16 dev_features; + + dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask); + mvdev->mlx_features = mlx_to_vritio_features(dev_features); + if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0)) + mvdev->mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1); + mvdev->mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM); + print_features(mvdev, mvdev->mlx_features, false); +} + static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu) { u16 hw_mtu; @@ -2005,6 +2009,7 @@ static int mlx5v_probe(struct auxiliary_device *adev, init_mvqs(ndev); mutex_init(>reslock); config = >config; + query_virtio_features(ndev); err = query_mtu(mdev, >mtu); if (err) goto err_mtu; -- 1.8.3.1
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/9/2021 7:53 PM, Jason Wang wrote: On 2021/2/10 上午10:30, Si-Wei Liu wrote: On 2/8/2021 10:37 PM, Jason Wang wrote: On 2021/2/9 下午2:12, Eli Cohen wrote: On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote: On 2021/2/8 下午6:04, Eli Cohen wrote: On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: On 2021/2/8 下午2:37, Eli Cohen wrote: On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: On 2021/2/6 上午7:07, Si-Wei Liu wrote: On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_state() is supposed to be called right after to get sync'ed with the latest internal avail_index from device while vq is stopped. The index was saved in the driver software at vq
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/8/2021 10:37 PM, Jason Wang wrote: On 2021/2/9 下午2:12, Eli Cohen wrote: On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote: On 2021/2/8 下午6:04, Eli Cohen wrote: On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote: On 2021/2/8 下午2:37, Eli Cohen wrote: On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote: On 2021/2/6 上午7:07, Si-Wei Liu wrote: On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_state() is supposed to be called right after to get sync'ed with the latest internal avail_index from device while vq is stopped. The index was saved in the driver software at vq suspension, but before the virtq object is destroyed. We shouldn't clear the avail_index t
Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK
On 2/8/2021 7:37 PM, Jason Wang wrote: On 2021/2/6 下午8:29, Si-Wei Liu wrote: While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index aa6f8cd..444ab58 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); Rethink about this. As mentioned in another thread, this in fact breaks set_vq_state(). (See vhost_virtqueue_start() -> vhost_vdpa_set_vring_base() in qemu codes). I assume that the clearing for vhost-vdpa would be done via (qemu code), vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | VIRTIO_CONFIG_S_DRIVER_OK) which is _after_ vhost_virtqueue_start() gets called to restore the avail_idx to h/w in vhost_dev_start(). What am I missing here? -Siwei The issue is that the avail idx is forgot, we need keep it. Thanks err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
Re: [PATCH 4/7] xen/events: link interdomain events to associated xenbus device
On Sat, Feb 06, 2021 at 11:49:29AM +0100, Juergen Gross wrote: > In order to support the possibility of per-device event channel > settings (e.g. lateeoi spurious event thresholds) add a xenbus device > pointer to struct irq_info() and modify the related event channel > binding interfaces to take the pointer to the xenbus device as a > parameter instead of the domain id of the other side. > > While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi(). > > Signed-off-by: Juergen Gross Reviewed-by: Wei Liu
Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK
On 2/7/2021 9:48 PM, Eli Cohen wrote: On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote: While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Not sure I understand the scenario. You are talking about reset of the device followed by up/down events on the interface. How can you trigger this? Currently it's not possible to trigger link up/down events with upstream QEMU due lack of config/control interrupt implementation. And live migration could be another scenario that cannot be satisfied because of inconsistent queue state. They share the same root of cause as captured here. -Siwei Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index aa6f8cd..444ab58 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); -- 1.8.3.1
Re: [PATCH 2/3] mlx5_vdpa: fix feature negotiation across device reset
On 2/7/2021 9:35 PM, Eli Cohen wrote: On Sat, Feb 06, 2021 at 04:29:23AM -0800, Si-Wei Liu wrote: The mlx_features denotes the capability for which set of virtio features is supported by device. In principle, this field needs not be cleared during virtio device reset, as this capability is static and does not change across reset. In fact, the current code may have the assumption that mlx_features can be reloaded from firmware via the .get_features ops after device is reset (via the .set_status ops), which is unfortunately not true. The userspace VMM might save a copy of backend capable features and won't call into kernel again to get it on reset. This causes all virtio features getting disabled on newly created virtqs after device reset, while guest would hold mismatched view of available features. For e.g., the guest may still assume tx checksum offload is available after reset and feature negotiation, causing frames with bogus (incomplete) checksum transmitted on the wire. Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8416c4..aa6f8cd 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1788,7 +1788,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; - ndev->mvdev.mlx_features = 0; ++mvdev->generation; return; } Since we assume that device capabilities don't change, I think I would get the features through a call done in mlx5v_probe after the netdev object is created and change mlx5_vdpa_get_features() to just return ndev->mvdev.mlx_features. Yep, it makes sense. Will post a revised patch. If vdpa tool allows reconfiguration post probing, the code has to be reconciled then. Did you actually see this issue in action? If you did, can you share with us how you trigerred this? Issue is indeed seen in action. The mismatched tx-checksum offload as described in the commit message was one of such examples. You would need a guest reboot though (triggering device reset via the .set_status ops and zero'ed mlx_features was loaded to deduce new actual_features for creating the h/w virtq object) for repro. -Siwei -- 1.8.3.1
[PATCH 1/3] mlx5_vdpa: should exclude header length and fcs from mtu
When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa, 22 extra bytes worth of MTU length is shown in guest. This is because the mlx5_query_port_max_mtu API returns the "hardware" MTU value, which does not just contain the Ethernet payload, but includes extra lengths starting from the Ethernet header up to the FCS altogether. Fix the MTU so packets won't get dropped silently. Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 4 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 ++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 08f742f..b6cc53b 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -4,9 +4,13 @@ #ifndef __MLX5_VDPA_H__ #define __MLX5_VDPA_H__ +#include +#include #include #include +#define MLX5V_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN) + struct mlx5_vdpa_direct_mr { u64 start; u64 end; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index dc88559..b8416c4 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1907,6 +1907,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 idx) .free = mlx5_vdpa_free, }; +static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu) +{ + u16 hw_mtu; + int err; + + err = mlx5_query_nic_vport_mtu(mdev, _mtu); + if (err) + return err; + + *mtu = hw_mtu - MLX5V_ETH_HARD_MTU; + return 0; +} + static int alloc_resources(struct mlx5_vdpa_net *ndev) { struct mlx5_vdpa_net_resources *res = >res; @@ -1992,7 +2005,7 @@ static int mlx5v_probe(struct auxiliary_device *adev, init_mvqs(ndev); mutex_init(>reslock); config = >config; - err = mlx5_query_nic_vport_mtu(mdev, >mtu); + err = query_mtu(mdev, >mtu); if (err) goto err_mtu; -- 1.8.3.1
[PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK
While virtq is stopped, get_vq_state() is supposed to be called to get sync'ed with the latest internal avail_index from device. The saved avail_index is used to restate the virtq once device is started. Commit b35ccebe3ef7 introduced the clear_virtqueues() routine to reset the saved avail_index, however, the index gets cleared a bit earlier before get_vq_state() tries to read it. This would cause consistency problems when virtq is restarted, e.g. through a series of link down and link up events. We could defer the clearing of avail_index to until the device is to be started, i.e. until VIRTIO_CONFIG_S_DRIVER_OK is set again in set_status(). Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change map") Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index aa6f8cd..444ab58 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); - clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ++mvdev->generation; @@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { if (status & VIRTIO_CONFIG_S_DRIVER_OK) { + clear_virtqueues(ndev); err = setup_driver(ndev); if (err) { mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); -- 1.8.3.1
[PATCH 2/3] mlx5_vdpa: fix feature negotiation across device reset
The mlx_features denotes the capability for which set of virtio features is supported by device. In principle, this field needs not be cleared during virtio device reset, as this capability is static and does not change across reset. In fact, the current code may have the assumption that mlx_features can be reloaded from firmware via the .get_features ops after device is reset (via the .set_status ops), which is unfortunately not true. The userspace VMM might save a copy of backend capable features and won't call into kernel again to get it on reset. This causes all virtio features getting disabled on newly created virtqs after device reset, while guest would hold mismatched view of available features. For e.g., the guest may still assume tx checksum offload is available after reset and feature negotiation, causing frames with bogus (incomplete) checksum transmitted on the wire. Signed-off-by: Si-Wei Liu --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index b8416c4..aa6f8cd 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1788,7 +1788,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) clear_virtqueues(ndev); mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; - ndev->mvdev.mlx_features = 0; ++mvdev->generation; return; } -- 1.8.3.1
Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
On 2/3/2021 11:36 PM, Eli Cohen wrote: When a change of memory map occurs, the hardware resources are destroyed and then re-created again with the new memory map. In such case, we need to restore the hardware available and used indices. The driver failed to restore the used index which is added here. Also, since the driver also fails to reset the available and used indices upon device reset, fix this here to avoid regression caused by the fact that used index may not be zero upon device reset. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen --- v0 -> v1: Clear indices upon device reset drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 88dde3455bfd..b5fe6d2ad22f 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { u64 device_addr; u64 driver_addr; u16 avail_index; + u16 used_index; bool ready; struct vdpa_callback cb; bool restore; @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { u32 virtq_id; struct mlx5_vdpa_net *ndev; u16 avail_idx; + u16 used_idx; int fw_state; /* keep last in the struct */ @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context); MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx); + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx); MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3, get_features_12_3(ndev->mvdev.actual_features)); vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context); @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m struct mlx5_virtq_attr { u8 state; u16 available_index; + u16 used_index; }; static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu memset(attr, 0, sizeof(*attr)); attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index); + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index); kfree(out); return 0; @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev) } } +static void clear_virtqueues(struct mlx5_vdpa_net *ndev) +{ + int i; + + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) { + ndev->vqs[i].avail_idx = 0; + ndev->vqs[i].used_idx = 0; + } +} + /* TODO: cross-endian support */ static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) { @@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu return err; ri->avail_index = attr.available_index; + ri->used_index = attr.used_index; ri->ready = mvq->ready; ri->num_ent = mvq->num_ent; ri->desc_addr = mvq->desc_addr; @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) continue; mvq->avail_idx = ri->avail_index; + mvq->used_idx = ri->used_index; mvq->ready = ri->ready; mvq->num_ent = ri->num_ent; mvq->desc_addr = ri->desc_addr; @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) if (!status) { mlx5_vdpa_info(mvdev, "performing device reset\n"); teardown_driver(ndev); + clear_virtqueues(ndev); The clearing looks fine at the first glance, as it aligns with the other state cleanups floating around at the same place. However, the thing is get_vq_state() is supposed to be called right after to get sync'ed with the latest internal avail_index from device while vq is stopped. The index was saved in the driver software at vq suspension, but before the virtq object is destroyed. We shouldn't clear the avail_index too early. Possibly it can be postponed to where VIRTIO_CONFIG_S_DRIVER_OK gets set again, i.e. right before the setup_driver() in mlx5_vdpa_set_status()? -Siwei mlx5_vdpa_destroy_mr(>mvdev); ndev->mvdev.status = 0; ndev->mvdev.mlx_features = 0;
Re: linux-next: manual merge of the hyperv tree with Linus' tree
On Fri, Feb 05, 2021 at 07:02:02PM +1100, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the hyperv tree got a conflict in: > > arch/x86/hyperv/hv_init.c > > between commit: > > fff7b5e6ee63 ("x86/hyperv: Initialize clockevents after LAPIC is > initialized") > > from Linus' tree and commits: > > a06c2e7df586 ("x86/hyperv: extract partition ID from Microsoft Hypervisor > if necessary") > fa2c411b58fe ("x86/hyperv: implement an MSI domain for root partition") > > from the hyperv tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > Thanks Stephen. I've fixed up the conflicts locally in hyperv-next. Wei.
Re: [PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor
On Wed, Feb 03, 2021 at 03:04:19PM +, Wei Liu wrote: > Wei Liu (16): > asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to > HV_CPU_MANAGEMENT > x86/hyperv: detect if Linux is the root partition > Drivers: hv: vmbus: skip VMBus initialization if Linux is root > clocksource/hyperv: use MSR-based access if running as root > x86/hyperv: allocate output arg pages if required > x86/hyperv: extract partition ID from Microsoft Hypervisor if > necessary > x86/hyperv: handling hypercall page setup for root > ACPI / NUMA: add a stub function for node_to_pxm() > x86/hyperv: provide a bunch of helper functions > x86/hyperv: implement and use hv_smp_prepare_cpus > asm-generic/hyperv: update hv_msi_entry > asm-generic/hyperv: update hv_interrupt_entry > asm-generic/hyperv: introduce hv_device_id and auxiliary structures > asm-generic/hyperv: import data structures for mapping device > interrupts > x86/hyperv: implement an MSI domain for root partition > iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition This series is now rebased and pushed to hyperv-next. Many thanks to all the people that provided reviews and comments. Wei.
Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
On Thu, Feb 04, 2021 at 07:45:25PM +0100, Rafael J. Wysocki wrote: > On Thu, Feb 4, 2021 at 7:41 PM Wei Liu wrote: > > > > On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote: > > > There is already a stub function for pxm_to_node but conversion to the > > > other direction is missing. > > > > > > It will be used by Microsoft Hypervisor code later. > > > > > > Signed-off-by: Wei Liu > > > > Hi ACPI maintainers, if you're happy with this patch I can take it via > > the hyperv-next tree, given the issue is discovered when pxm_to_node is > > called in our code. > > Yes, you can. Thanks Rafael. I will add your ack to the patch as well. Wei.
Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
On Thu, Feb 04, 2021 at 06:40:55PM +, Michael Kelley wrote: > From: Wei Liu Sent: Thursday, February 4, 2021 9:57 AM [...] > > I've got the following diff to fix both issues. If you're happy with the > > changes, can you give your Reviewed-by? That saves a round of posting. > > > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > index 0cabc9aece38..fa71db798465 100644 > > --- a/arch/x86/hyperv/irqdomain.c > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -1,7 +1,7 @@ > > // SPDX-License-Identifier: GPL-2.0 > > > > /* > > - * for Linux to run as the root partition on Microsoft Hypervisor. > > + * Irqdomain for Linux to run as the root partition on Microsoft > > Hypervisor. > > * > > * Authors: > > * Sunil Muthuswamy > > @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, > > bool level, > > struct hv_device_interrupt_descriptor *intr_desc; > > unsigned long flags; > > u64 status; > > - cpumask_t mask = CPU_MASK_NONE; > > + const cpumask_t *mask; > > int nr_bank, var_size; > > > > local_irq_save(flags); > > @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id > > device_id, bool > > level, > > else > > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > > - cpumask_set_cpu(cpu, ); > > + mask = cpumask_of(cpu); > > intr_desc->target.vp_set.valid_bank_mask = 0; > > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; > > - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), ); > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask); > > Can you just do the following and get rid of the 'mask' local entirely? > > nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu)); Sure. That can be done. > > Either way, > > Reviewed-by: Michael Kelley Thank you. Wei.
Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote: > There is already a stub function for pxm_to_node but conversion to the > other direction is missing. > > It will be used by Microsoft Hypervisor code later. > > Signed-off-by: Wei Liu Hi ACPI maintainers, if you're happy with this patch I can take it via the hyperv-next tree, given the issue is discovered when pxm_to_node is called in our code. > --- > v6: new > --- > include/acpi/acpi_numa.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h > index a4c6ef809e27..40a91ce87e04 100644 > --- a/include/acpi/acpi_numa.h > +++ b/include/acpi/acpi_numa.h > @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm) > { > return 0; > } > +static inline int node_to_pxm(int node) > +{ > + return 0; > +} > #endif /* CONFIG_ACPI_NUMA */ > > #ifdef CONFIG_ACPI_HMAT > -- > 2.20.1 >
Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
On Thu, Feb 04, 2021 at 05:43:16PM +, Michael Kelley wrote: [...] > > remove_cpuhp_state: > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > new file mode 100644 > > index ..117f17e8c88a > > --- /dev/null > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -0,0 +1,362 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +/* > > + * for Linux to run as the root partition on Microsoft Hypervisor. > > Nit: Looks like the initial word "Irqdomain" got dropped from the above > comment line. But don't respin just for this. > I've added it back. Thanks. > > +static int hv_map_interrupt(union hv_device_id device_id, bool level, > > + int cpu, int vector, struct hv_interrupt_entry *entry) > > +{ > > + struct hv_input_map_device_interrupt *input; > > + struct hv_output_map_device_interrupt *output; > > + struct hv_device_interrupt_descriptor *intr_desc; > > + unsigned long flags; > > + u64 status; > > + cpumask_t mask = CPU_MASK_NONE; > > + int nr_bank, var_size; > > + > > + local_irq_save(flags); > > + > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > > + > > + intr_desc = >interrupt_descriptor; > > + memset(input, 0, sizeof(*input)); > > + input->partition_id = hv_current_partition_id; > > + input->device_id = device_id.as_uint64; > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > > + intr_desc->vector_count = 1; > > + intr_desc->target.vector = vector; > > + > > + if (level) > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > + else > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > + > > + cpumask_set_cpu(cpu, ); > > + intr_desc->target.vp_set.valid_bank_mask = 0; > > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), ); > > There's a function get_cpu_mask() that returns a pointer to a cpumask with > only > the specified cpu set in the mask. It returns a const pointer to the correct > entry > in a pre-allocated array of all such cpumasks, so it's a lot more efficient > than > allocating and initializing a local cpumask instance on the stack. > That's nice. I've got the following diff to fix both issues. If you're happy with the changes, can you give your Reviewed-by? That saves a round of posting. diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 0cabc9aece38..fa71db798465 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* - * for Linux to run as the root partition on Microsoft Hypervisor. + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor. * * Authors: * Sunil Muthuswamy @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level, struct hv_device_interrupt_descriptor *intr_desc; unsigned long flags; u64 status; - cpumask_t mask = CPU_MASK_NONE; + const cpumask_t *mask; int nr_bank, var_size; local_irq_save(flags); @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level, else intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; - cpumask_set_cpu(cpu, ); + mask = cpumask_of(cpu); intr_desc->target.vp_set.valid_bank_mask = 0; intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), ); + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask); if (nr_bank < 0) { local_irq_restore(flags); pr_err("%s: unable to generate VP set\n", __func__);
Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, February 3, 2021 4:47 AM [...] > > > > + > > > > + if (level) > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > > > + else > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > > + > > > > + __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask); > > > > + > > > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, > > > > input, > > output) & > > > > +HV_HYPERCALL_RESULT_MASK; > > > > + local_irq_restore(flags); > > > > + > > > > + *entry = output->interrupt_entry; > > > > + > > > > + return status; > > > > > > As a cross-check, I was comparing this code against > > > hv_map_msi_interrupt(). They are > > > mostly parallel, though some of the assignments are done in a different > > > order. It's a nit, > > > but making them as parallel as possible would be nice. :-) > > > > > > > Indeed. I will see about factoring out a function. > > If factoring out a separate helper function is clumsy, just having the > parallel code > in the two functions be as similar as possible makes it easier to see what's > the > same and what's different. > No. It is not clumsy at all. I've done it in the newly posted v6. I was baffled why I wrote hv_unmap_interrupt helper to be used by both hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough coffee that day. :-/ Thanks for pointing out that issue. It definitely helped improve the quality of this series. Wei.
Re: [PATCH 08/10] clocksource/drivers/hyper-v: Handle sched_clock differences inline
On Thu, Feb 04, 2021 at 04:28:38PM +, Michael Kelley wrote: > From: Wei Liu Sent: Monday, February 1, 2021 10:55 AM > > > > On Wed, Jan 27, 2021 at 12:23:43PM -0800, Michael Kelley wrote: > > [...] > > > +/* > > > + * Reference to pv_ops must be inline so objtool > > > + * detection of noinstr violations can work correctly. > > > + */ > > > +static __always_inline void hv_setup_sched_clock(void *sched_clock) > > > > sched_clock_register is not trivial. Having __always_inline here is > > going to make the compiled object bloated. > > > > Given this is a static function, I don't think we need to specify any > > inline keyword. The compiler should be able to determine whether this > > function should be inlined all by itself. > > > > Wei. > > There was an explicit request from Peter Zijlstra and Thomas Gleixner > to force it inline. See https://lore.kernel.org/patchwork/patch/1283635/ and > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/include/asm/mshyperv.h?id=b9d8cf2eb3ceecdee3434b87763492aee9e28845 Oh. noinstr. I failed to notice the comment. Sorry for the noise. Wei.
Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
On Tue, Feb 2, 2021 at 9:16 PM Jason Wang wrote: > > > On 2021/2/3 上午1:54, Si-Wei Liu wrote: > > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen wrote: > >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote: > >>> Thanks Eli and Jason for clarifications. See inline. > >>> > >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen wrote: > >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote: > >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote: > >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang wrote: > >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote: > >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu > >>>>>>>> wrote: > >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen wrote: > >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current > >>>>>>>>>> available > >>>>>>>>>> index. This is done when a change of map occurs when the driver > >>>>>>>>>> calls > >>>>>>>>>> save_channel_info(). > >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of > >>>>>>>>> which doesn't save the available index as save_channel_info() > >>>>>>>>> doesn't > >>>>>>>>> get called in that path at all. How does it handle the case that > >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the > >>>>>>>>> hardware VQ object was torn down, but userspace still wants to > >>>>>>>>> access > >>>>>>>>> the queue index? > >>>>>>>>> > >>>>>>>>> Refer to > >>>>>>>>> https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/ > >>>>>>>>> > >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily > >>>>>>>>> unavailable (11) > >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily > >>>>>>>>> unavailable (11) > >>>>>>>>> > >>>>>>>>> QEMU will complain with the above warning while VM is being rebooted > >>>>>>>>> or shut down. > >>>>>>>>> > >>>>>>>>> Looks to me either the kernel driver should cover this requirement, > >>>>>>>>> or > >>>>>>>>> the userspace has to bear the burden in saving the index and not > >>>>>>>>> call > >>>>>>>>> into kernel if VQ is destroyed. > >>>>>>>> Actually, the userspace doesn't have the insights whether virt queue > >>>>>>>> will be destroyed if just changing the device status via > >>>>>>>> set_status(). > >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave > >>>>>>>> like > >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and > >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace. > >>>>>>> So I think we can simply drop this patch? > >>>>>> Yep, I think so. To be honest I don't know why it has anything to do > >>>>>> with the memory hotplug issue. > >>>>> > >>>>> Eli may know more, my understanding is that, during memory hotplut, qemu > >>>>> need to propagate new memory mappings via set_map(). For mellanox, it > >>>>> means > >>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended. > >>>>> > >>>> I think Siwei was asking why the first patch was related to the hotplug > >>>> issue. > >>> I was thinking how consistency is assured when saving/restoring this > >>> h/w avail_index against the one in the virtq memory, particularly in > >>> the region_add/.region_del() context (e.g. the hotplug case). Problem > >>> is I don't see explicit memory barrier when guest thread updates the > >>> avail_index, how does the device make sure the h/w
Re: [PATCH] vdpa/mlx5: Restore the hardware used index after change map
On Tue, Feb 2, 2021 at 10:48 PM Eli Cohen wrote: > > On Tue, Feb 02, 2021 at 09:14:02AM -0800, Si-Wei Liu wrote: > > On Tue, Feb 2, 2021 at 6:34 AM Eli Cohen wrote: > > > > > > When a change of memory map occurs, the hardware resources are destroyed > > > and then re-created again with the new memory map. In such case, we need > > > to restore the hardware available and used indices. The driver failed to > > > restore the used index which is added here. > > > > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 > > > devices") > > > Signed-off-by: Eli Cohen > > > --- > > > This patch is being sent again a single patch the fixes hot memory > > > addtion to a qemy process. > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > index 88dde3455bfd..839f57c64a6f 100644 > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info { > > > u64 device_addr; > > > u64 driver_addr; > > > u16 avail_index; > > > + u16 used_index; > > > bool ready; > > > struct vdpa_callback cb; > > > bool restore; > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue { > > > u32 virtq_id; > > > struct mlx5_vdpa_net *ndev; > > > u16 avail_idx; > > > + u16 used_idx; > > > int fw_state; > > > > > > /* keep last in the struct */ > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net > > > *ndev, struct mlx5_vdpa_virtque > > > > > > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, > > > obj_context); > > > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, > > > mvq->avail_idx); > > > + MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, > > > mvq->used_idx); > > > > The saved indexes will apply to the new virtqueue object whenever it > > is created. In virtio spec, these indexes will reset back to zero when > > the virtio device is reset. But I don't see how it's done today. IOW, > > I don't see where avail_idx and used_idx get cleared from the mvq for > > device reset via set_status(). > > > > Right, but this is not strictly related to this patch. I will post > another patch to fix this. Better to post these two patches in a series.Or else it may cause VM reboot problem as that is where the device gets reset. The avail_index did not as the correct value will be written to by driver right after, but used_idx introduced by this patch is supplied by device hence this patch alone would introduce regression. > > BTW, can you describe a secnario that would cause a reset (through > calling set_status()) that happens after the VQ has been used? You can try reboot the guest, that'll be the easy way to test. -Siwei > > > -Siwei > > > > > > > MLX5_SET(virtio_net_q_object, obj_context, > > > queue_feature_bit_mask_12_3, > > > get_features_12_3(ndev->mvdev.actual_features)); > > > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, > > > virtio_q_context); > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, > > > struct mlx5_vdpa_virtqueue *m > > > struct mlx5_virtq_attr { > > > u8 state; > > > u16 available_index; > > > + u16 used_index; > > > }; > > > > > > static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct > > > mlx5_vdpa_virtqueue *mvq, > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net > > > *ndev, struct mlx5_vdpa_virtqueu > > > memset(attr, 0, sizeof(*attr)); > > > attr->state = MLX5_GET(virtio_net_q_object, obj_context, state); > > > attr->available_index = MLX5_GET(virtio_net_q_object, > > > obj_context, hw_available_index); > > > + attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, > > > hw_used_index); > > > kfree(out); > > > return 0; > > > > > > @@ -1610,6 +1615,7 @@ static int save_channel_info(struct mlx5_vdpa_net > > > *ndev, struct mlx5_vdpa_virtqu > > > return err; > > > > > > ri->avail_index = attr.available_index; > > > + ri->used_index = attr.used_index; > > > ri->ready = mvq->ready; > > > ri->num_ent = mvq->num_ent; > > > ri->desc_addr = mvq->desc_addr; > > > @@ -1654,6 +1660,7 @@ static void restore_channels_info(struct > > > mlx5_vdpa_net *ndev) > > > continue; > > > > > > mvq->avail_idx = ri->avail_index; > > > + mvq->used_idx = ri->used_index; > > > mvq->ready = ri->ready; > > > mvq->num_ent = ri->num_ent; > > > mvq->desc_addr = ri->desc_addr; > > > -- > > > 2.29.2 > > >
[PATCH v6 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
This makes the name match Hyper-V TLFS. Signed-off-by: Wei Liu Reviewed-by: Vitaly Kuznetsov Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- include/asm-generic/hyperv-tlfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index e73a11850055..e6903589a82a 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -88,7 +88,7 @@ #define HV_CONNECT_PORTBIT(7) #define HV_ACCESS_STATSBIT(8) #define HV_DEBUGGING BIT(11) -#define HV_CPU_POWER_MANAGEMENTBIT(12) +#define HV_CPU_MANAGEMENT BIT(12) /* -- 2.20.1
[PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition
For now we can use the privilege flag to check. Stash the value to be used later. Put in a bunch of defines for future use when we want to have more fine-grained detection. Signed-off-by: Wei Liu Reviewed-by: Pavel Tatashin --- v3: move hv_root_partition to mshyperv.c --- arch/x86/include/asm/hyperv-tlfs.h | 10 ++ arch/x86/include/asm/mshyperv.h| 2 ++ arch/x86/kernel/cpu/mshyperv.c | 20 3 files changed, 32 insertions(+) diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 6bf42aed387e..204010350604 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -21,6 +21,7 @@ #define HYPERV_CPUID_FEATURES 0x4003 #define HYPERV_CPUID_ENLIGHTMENT_INFO 0x4004 #define HYPERV_CPUID_IMPLEMENT_LIMITS 0x4005 +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x4007 #define HYPERV_CPUID_NESTED_FEATURES 0x400A #define HYPERV_CPUID_VIRT_STACK_INTERFACE 0x4081 @@ -110,6 +111,15 @@ /* Recommend using enlightened VMCS */ #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDEDBIT(14) +/* + * CPU management features identification. + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits. + */ +#define HV_X64_START_LOGICAL_PROCESSOR BIT(0) +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1) +#define HV_X64_PERFORMANCE_COUNTER_SYNCBIT(2) +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31) + /* * Virtual processor will never share a physical core with another virtual * processor, except for virtual processors that are reported as sibling SMT diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ffc289992d1b..ac2b0d110f03 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list( struct hv_guest_mapping_flush_list *flush, u64 start_gfn, u64 end_gfn); +extern bool hv_root_partition; + #ifdef CONFIG_X86_64 void hv_apic_init(void); void __init hv_init_spinlocks(void); diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index f628e3dc150f..c376d191a260 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -32,6 +32,10 @@ #include #include +/* Is Linux running as the root partition? */ +bool hv_root_partition; +EXPORT_SYMBOL_GPL(hv_root_partition); + struct ms_hyperv_info ms_hyperv; EXPORT_SYMBOL_GPL(ms_hyperv); @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void) pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n", ms_hyperv.max_vp_index, ms_hyperv.max_lp_index); + /* +* Check CPU management privilege. +* +* To mirror what Windows does we should extract CPU management +* features and use the ReservedIdentityBit to detect if Linux is the +* root partition. But that requires negotiating CPU management +* interface (a process to be finalized). +* +* For now, use the privilege flag as the indicator for running as +* root. +*/ + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) { + hv_root_partition = true; + pr_info("Hyper-V: running as root partition\n"); + } + /* * Extract host information. */ -- 2.20.1
[PATCH v6 04/16] clocksource/hyperv: use MSR-based access if running as root
When Linux runs as the root partition, the setup required for TSC page is different. Luckily Linux also has access to the MSR based clocksource. We can just disable the TSC page clocksource if Linux is the root partition. Signed-off-by: Wei Liu Acked-by: Daniel Lezcano Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- drivers/clocksource/hyperv_timer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c index ba04cb381cd3..269a691bd2c4 100644 --- a/drivers/clocksource/hyperv_timer.c +++ b/drivers/clocksource/hyperv_timer.c @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void) if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE)) return false; + if (hv_root_partition) + return false; + hv_read_reference_counter = read_hv_clock_tsc; phys_addr = virt_to_phys(hv_get_tsc_page()); -- 2.20.1
[PATCH v6 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root
There is no VMBus and the other infrastructures initialized in hv_acpi_init when Linux is running as the root partition. Signed-off-by: Wei Liu Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- v3: Return 0 instead of -ENODEV. --- drivers/hv/vmbus_drv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 502f8cd95f6d..ee27b3670a51 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void) if (!hv_is_hyperv_initialized()) return -ENODEV; + if (hv_root_partition) + return 0; + init_completion(_event); /* -- 2.20.1
[PATCH v6 05/16] x86/hyperv: allocate output arg pages if required
When Linux runs as the root partition, it will need to make hypercalls which return data from the hypervisor. Allocate pages for storing results when Linux runs as the root partition. Signed-off-by: Lillian Grassin-Drake Co-Developed-by: Lillian Grassin-Drake Signed-off-by: Wei Liu --- v3: Fix hv_cpu_die to use free_pages. v2: Address Vitaly's comments --- arch/x86/hyperv/hv_init.c | 35 - arch/x86/include/asm/mshyperv.h | 1 + 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index e04d90af4c27..6f4cb40e53fe 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page); void __percpu **hyperv_pcpu_input_arg; EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg); +void __percpu **hyperv_pcpu_output_arg; +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg); + u32 hv_max_vp_index; EXPORT_SYMBOL_GPL(hv_max_vp_index); @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu) void **input_arg; struct page *pg; - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */ - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL); + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0); if (unlikely(!pg)) return -ENOMEM; + + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); *input_arg = page_address(pg); + if (hv_root_partition) { + void **output_arg; + + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); + *output_arg = page_address(pg + 1); + } hv_get_vp_index(msr_vp_index); @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu) unsigned int new_cpu; unsigned long flags; void **input_arg; - void *input_pg = NULL; + void *pg; local_irq_save(flags); input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); - input_pg = *input_arg; + pg = *input_arg; *input_arg = NULL; + + if (hv_root_partition) { + void **output_arg; + + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); + *output_arg = NULL; + } + local_irq_restore(flags); - free_page((unsigned long)input_pg); + + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0); if (hv_vp_assist_page && hv_vp_assist_page[cpu]) wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); @@ -346,6 +365,12 @@ void __init hyperv_init(void) BUG_ON(hyperv_pcpu_input_arg == NULL); + /* Allocate the per-CPU state for output arg for root */ + if (hv_root_partition) { + hyperv_pcpu_output_arg = alloc_percpu(void *); + BUG_ON(hyperv_pcpu_output_arg == NULL); + } + /* Allocate percpu VP index */ hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index), GFP_KERNEL); diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ac2b0d110f03..62d9390f1ddf 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {} #if IS_ENABLED(CONFIG_HYPERV) extern void *hv_hypercall_pg; extern void __percpu **hyperv_pcpu_input_arg; +extern void __percpu **hyperv_pcpu_output_arg; static inline u64 hv_do_hypercall(u64 control, void *input, void *output) { -- 2.20.1
[PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
There is already a stub function for pxm_to_node but conversion to the other direction is missing. It will be used by Microsoft Hypervisor code later. Signed-off-by: Wei Liu --- v6: new --- include/acpi/acpi_numa.h | 4 1 file changed, 4 insertions(+) diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h index a4c6ef809e27..40a91ce87e04 100644 --- a/include/acpi/acpi_numa.h +++ b/include/acpi/acpi_numa.h @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm) { return 0; } +static inline int node_to_pxm(int node) +{ + return 0; +} #endif /* CONFIG_ACPI_NUMA */ #ifdef CONFIG_ACPI_HMAT -- 2.20.1
[PATCH v6 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus
Microsoft Hypervisor requires the root partition to make a few hypercalls to setup application processors before they can be used. Signed-off-by: Lillian Grassin-Drake Signed-off-by: Sunil Muthuswamy Co-Developed-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu Reviewed-by: Michael Kelley --- CPU hotplug and unplug is not yet supported in this setup, so those paths remain untouched. v3: Always call native SMP preparation function. --- arch/x86/kernel/cpu/mshyperv.c | 29 + 1 file changed, 29 insertions(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index c376d191a260..13d3b6dd21a3 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -31,6 +31,7 @@ #include #include #include +#include /* Is Linux running as the root partition? */ bool hv_root_partition; @@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void) hv_init_spinlocks(); #endif } + +static void __init hv_smp_prepare_cpus(unsigned int max_cpus) +{ +#ifdef CONFIG_X86_64 + int i; + int ret; +#endif + + native_smp_prepare_cpus(max_cpus); + +#ifdef CONFIG_X86_64 + for_each_present_cpu(i) { + if (i == 0) + continue; + ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i)); + BUG_ON(ret); + } + + for_each_present_cpu(i) { + if (i == 0) + continue; + ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i); + BUG_ON(ret); + } +#endif +} #endif static void __init ms_hyperv_init_platform(void) @@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void) # ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu; + if (hv_root_partition) + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus; # endif /* -- 2.20.1
[PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions
They are used to deposit pages into Microsoft Hypervisor and bring up logical and virtual processors. Signed-off-by: Lillian Grassin-Drake Signed-off-by: Sunil Muthuswamy Signed-off-by: Nuno Das Neves Co-Developed-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Co-Developed-by: Nuno Das Neves Signed-off-by: Wei Liu --- v6: 1. Address Michael's comments. v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set. v3: 1. Add __packed to structures. 2. Drop unnecessary exports. v2: 1. Adapt to hypervisor side changes 2. Address Vitaly's comments use u64 status pages major comments minor comments rely on acpi code --- arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_proc.c | 219 ++ arch/x86/include/asm/mshyperv.h | 4 + include/asm-generic/hyperv-tlfs.h | 67 + 4 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 arch/x86/hyperv/hv_proc.c diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 89b1f74d3225..565358020921 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y := hv_init.o mmu.o nested.o -obj-$(CONFIG_X86_64) += hv_apic.o +obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o ifdef CONFIG_X86_64 obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c new file mode 100644 index ..60461e598239 --- /dev/null +++ b/arch/x86/hyperv/hv_proc.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* + * See struct hv_deposit_memory. The first u64 is partition ID, the rest + * are GPAs. + */ +#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1) + +/* Deposits exact number of pages. Must be called with interrupts enabled. */ +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages) +{ + struct page **pages, *page; + int *counts; + int num_allocations; + int i, j, page_count; + int order; + u64 status; + int ret; + u64 base_pfn; + struct hv_deposit_memory *input_page; + unsigned long flags; + + if (num_pages > HV_DEPOSIT_MAX) + return -E2BIG; + if (!num_pages) + return 0; + + /* One buffer for page pointers and counts */ + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + pages = page_address(page); + + counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL); + if (!counts) { + free_page((unsigned long)pages); + return -ENOMEM; + } + + /* Allocate all the pages before disabling interrupts */ + i = 0; + + while (num_pages) { + /* Find highest order we can actually allocate */ + order = 31 - __builtin_clz(num_pages); + + while (1) { + pages[i] = alloc_pages_node(node, GFP_KERNEL, order); + if (pages[i]) + break; + if (!order) { + ret = -ENOMEM; + num_allocations = i; + goto err_free_allocations; + } + --order; + } + + split_page(pages[i], order); + counts[i] = 1 << order; + num_pages -= counts[i]; + i++; + } + num_allocations = i; + + local_irq_save(flags); + + input_page = *this_cpu_ptr(hyperv_pcpu_input_arg); + + input_page->partition_id = partition_id; + + /* Populate gpa_page_list - these will fit on the input page */ + for (i = 0, page_count = 0; i < num_allocations; ++i) { + base_pfn = page_to_pfn(pages[i]); + for (j = 0; j < counts[i]; ++j, ++page_count) + input_page->gpa_page_list[page_count] = base_pfn + j; + } + status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY, +page_count, 0, input_page, NULL); + local_irq_restore(flags); + + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) { + pr_err("Failed to deposit pages: %lld\n", status); + ret = status; + goto err_free_allocations; + } + + ret = 0; + goto free_buf; + +err_free_allocations: + for (i = 0; i < num_allocations; ++i) { + base_pfn = page_to_pfn(pages[i]); + for (j = 0; j < counts[i]; ++j) + __free_page(pfn_to_page(base_pfn + j)); + } + +free_buf: + free_page((unsigned long)pages); + kfr
[PATCH v6 12/16] asm-generic/hyperv: update hv_interrupt_entry
We will soon use the same structure to handle IO-APIC interrupts as well. Introduce an enum to identify the source and a data structure for IO-APIC RTE. While at it, update pci-hyperv.c to use the enum. No functional change. Signed-off-by: Wei Liu Acked-by: Rob Herring Reviewed-by: Michael Kelley --- drivers/pci/controller/pci-hyperv.c | 2 +- include/asm-generic/hyperv-tlfs.h | 36 +++-- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 6db8d96a78eb..87aa62ee0368 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data) params = >retarget_msi_interrupt_params; memset(params, 0, sizeof(*params)); params->partition_id = HV_PARTITION_ID_SELF; - params->int_entry.source = 1; /* MSI(-X) */ + params->int_entry.source = HV_INTERRUPT_SOURCE_MSI; hv_set_msi_entry_from_desc(>int_entry.msi_entry, msi_desc); params->device_id = (hbus->hdev->dev_instance.b[5] << 24) | (hbus->hdev->dev_instance.b[4] << 16) | diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index 4669f9a4e1f1..94c7d77bbf68 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -480,6 +480,11 @@ struct hv_create_vp { u64 flags; } __packed; +enum hv_interrupt_source { + HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */ + HV_INTERRUPT_SOURCE_IOAPIC, +}; + union hv_msi_address_register { u32 as_uint32; struct { @@ -513,10 +518,37 @@ union hv_msi_entry { } __packed; }; +union hv_ioapic_rte { + u64 as_uint64; + + struct { + u32 vector:8; + u32 delivery_mode:3; + u32 destination_mode:1; + u32 delivery_status:1; + u32 interrupt_polarity:1; + u32 remote_irr:1; + u32 trigger_mode:1; + u32 interrupt_mask:1; + u32 reserved1:15; + + u32 reserved2:24; + u32 destination_id:8; + }; + + struct { + u32 low_uint32; + u32 high_uint32; + }; +} __packed; + struct hv_interrupt_entry { - u32 source; /* 1 for MSI(-X) */ + u32 source; u32 reserved1; - union hv_msi_entry msi_entry; + union { + union hv_msi_entry msi_entry; + union hv_ioapic_rte ioapic_rte; + }; } __packed; /* -- 2.20.1