Re: [PATCH vhost v2 4/8] vdpa/mlx5: Mark vq addrs for modification in hw vq

2023-12-12 Thread Si-Wei Liu




On 12/12/2023 11:21 AM, Eugenio Perez Martin wrote:

On Tue, Dec 5, 2023 at 11:46 AM Dragos Tatulea  wrote:

Addresses get set by .set_vq_address. hw vq addresses will be updated on
next modify_virtqueue.

Signed-off-by: Dragos Tatulea 
Reviewed-by: Gal Pressman 
Acked-by: Eugenio Pérez 

I'm kind of ok with this patch and the next one about state, but I
didn't ack them in the previous series.

My main concern is that it is not valid to change the vq address after
DRIVER_OK in VirtIO, which vDPA follows. Only memory maps are ok to
change at this moment. I'm not sure about vq state in vDPA, but vhost
forbids changing it with an active backend.

Suspend is not defined in VirtIO at this moment though, so maybe it is
ok to decide that all of these parameters may change during suspend.
Maybe the best thing is to protect this with a vDPA feature flag.
I think protect with vDPA feature flag could work, while on the other 
hand vDPA means vendor specific optimization is possible around suspend 
and resume (in case it helps performance), which doesn't have to be 
backed by virtio spec. Same applies to vhost user backend features, 
variations there were not backed by spec either. Of course, we should 
try best to make the default behavior backward compatible with 
virtio-based backend, but that circles back to no suspend definition in 
the current virtio spec, for which I hope we don't cease development on 
vDPA indefinitely. After all, the virtio based vdap backend can well 
define its own feature flag to describe (minor difference in) the 
suspend behavior based on the later spec once it is formed in future.


Regards,
-Siwei





Jason, what do you think?

Thanks!


---
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 9 +
  include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
  2 files changed, 10 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f8f088cced50..80e066de0866 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1209,6 +1209,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 bool state_change = false;
 void *obj_context;
 void *cmd_hdr;
+   void *vq_ctx;
 void *in;
 int err;

@@ -1230,6 +1231,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

 obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+   vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);

 if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
 if (!is_valid_state_change(mvq->fw_state, state, 
is_resumable(ndev))) {
@@ -1241,6 +1243,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 state_change = true;
 }

+   if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
+   MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+   MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+   MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+   }
+
 MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, 
mvq->modified_fields);
 err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
 if (err)
@@ -2202,6 +2210,7 @@ static int mlx5_vdpa_set_vq_address(struct vdpa_device 
*vdev, u16 idx, u64 desc_
 mvq->desc_addr = desc_area;
 mvq->device_addr = device_area;
 mvq->driver_addr = driver_area;
+   mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS;
 return 0;
  }

diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index b86d51a855f6..9594ac405740 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -145,6 +145,7 @@ enum {
 MLX5_VIRTQ_MODIFY_MASK_STATE= (u64)1 << 0,
 MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS  = (u64)1 << 3,
 MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+   MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS   = (u64)1 << 6,
 MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
  };

--
2.42.0






Re: [PATCH 1/1] video: hyperv_fb: Add ratelimit on error message

2021-04-20 Thread Wei Liu
On Tue, Apr 20, 2021 at 08:44:19AM -0700, Michael Kelley wrote:
> Due to a full ring buffer, the driver may be unable to send updates to
> the Hyper-V host.  But outputing the error message can make the problem
> worse because console output is also typically written to the frame
> buffer.  As a result, in some circumstances the error message is output
> continuously.
> 
> Break the cycle by rate limiting the error message.  Also output
> the error code for additional diagnosability.
> 
> Signed-off-by: Michael Kelley 

Applied to hyperv-next. Thanks.

> ---
>  drivers/video/fbdev/hyperv_fb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
> index 4dc9077..a7e6eea 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -308,7 +308,7 @@ static inline int synthvid_send(struct hv_device *hdev,
>  VM_PKT_DATA_INBAND, 0);
>  
>   if (ret)
> - pr_err("Unable to send packet via vmbus\n");
> + pr_err_ratelimited("Unable to send packet via vmbus; error 
> %d\n", ret);
>  
>   return ret;
>  }
> -- 
> 1.8.3.1
> 


Re: ** POTENTIAL FRAUD ALERT - RED HAT ** [PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload

2021-04-20 Thread Wei Liu
On Tue, Apr 20, 2021 at 11:31:54AM +0200, Vitaly Kuznetsov wrote:
> Michael Kelley  writes:
> 
> > When running in Azure, disks may be connected to a Linux VM with
> > read/write caching enabled. If a VM panics and issues a VMbus
> > UNLOAD request to Hyper-V, the response is delayed until all dirty
> > data in the disk cache is flushed.  In extreme cases, this flushing
> > can take 10's of seconds, depending on the disk speed and the amount
> > of dirty data. If kdump is configured for the VM, the current 10 second
> > timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD
> > complete message may arrive well after the kdump kernel is already
> > running, causing problems.  Note that no problem occurs if kdump is
> > not enabled because Hyper-V waits for the cache flush before doing
> > a reboot through the BIOS/UEFI code.
> >
> > Fix this problem by increasing the timeout in vmbus_wait_for_unload()
> > to 100 seconds. Also output periodic messages so that if anyone is
> > watching the serial console, they won't think the VM is completely
> > hung.
> >
> > Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to 
> > vmbus_wait_for_unload")
> > Signed-off-by: Michael Kelley 

Applied to hyperv-next. Thanks.

> > ---
[...]
> >  
> > +#define UNLOAD_DELAY_UNIT_MS   10  /* 10 milliseconds */
> > +#define UNLOAD_WAIT_MS (100*1000)  /* 100 seconds */
> > +#define UNLOAD_WAIT_LOOPS  (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS)
> > +#define UNLOAD_MSG_MS  (5*1000)/* Every 5 seconds */
> > +#define UNLOAD_MSG_LOOPS   (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS)
> > +
> >  static void vmbus_wait_for_unload(void)
> >  {
> > int cpu;
> > @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void)
> >  * vmbus_connection.unload_event. If not, the last thing we can do is
> >  * read message pages for all CPUs directly.
> >  *
> > -* Wait no more than 10 seconds so that the panic path can't get
> > -* hung forever in case the response message isn't seen.
> > +* Wait up to 100 seconds since an Azure host must writeback any dirty
> > +* data in its disk cache before the VMbus UNLOAD request will
> > +* complete. This flushing has been empirically observed to take up
> > +* to 50 seconds in cases with a lot of dirty data, so allow additional
> > +* leeway and for inaccuracies in mdelay(). But eventually time out so
> > +* that the panic path can't get hung forever in case the response
> > +* message isn't seen.
> 
> I vaguely remember debugging cases when CHANNELMSG_UNLOAD_RESPONSE never
> arrives, it was kind of pointless to proceed to kexec as attempts to
> reconnect Vmbus devices were failing (no devices were offered after
> CHANNELMSG_REQUESTOFFERS AFAIR). Would it maybe make sense to just do
> emergency reboot instead of proceeding to kexec when this happens? Just
> wondering.
> 

Please submit a follow-up patch if necessary.

Wei.


Re: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically

2021-04-20 Thread Wei Liu
On Tue, Apr 20, 2021 at 04:50:56AM +, Michael Kelley wrote:
> From: Andrea Parri (Microsoft)  Sent: Monday, April 
> 19, 2021 6:44 PM
> > 
> > If a malicious or compromised Hyper-V sends a spurious message of type
> > CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will
> > call complete() on an uninitialized event, and cause an oops.
> > 
> > Reported-by: Michael Kelley 
> > Signed-off-by: Andrea Parri (Microsoft) 
> > ---
> > Changes since v1[1]:
> >   - add inline comment in vmbus_unload_response()
> > 
> > [1] 
> > https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/
> > 
> >  drivers/hv/channel_mgmt.c | 7 ++-
> >  drivers/hv/connection.c   | 2 ++
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> > 
> 
> Reviewed-by: Michael Kelley 

Applied to hyperv-next. Thanks.


Re: [PATCH 1/2] x86/hyperv: Move hv_do_rep_hypercall to asm-generic

2021-04-18 Thread Wei Liu
On Sun, Apr 18, 2021 at 02:42:55PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Sunday, April 18, 2021 6:09 AM
> > On Fri, Apr 16, 2021 at 05:43:02PM -0700, Joseph Salisbury wrote:
> > > From: Joseph Salisbury 
> > >
> > > This patch makes no functional changes.  It simply moves 
> > > hv_do_rep_hypercall()
> > > out of arch/x86/include/asm/mshyperv.h and into asm-generic/mshyperv.h
> > >
> > > hv_do_rep_hypercall() is architecture independent, so it makes sense that 
> > > it
> > > should be in the architecture independent mshyperv.h, not in the 
> > > x86-specific
> > > mshyperv.h.
> > >
> > > This is done in preperation for a follow up patch which creates a 
> > > consistent
> > > pattern for checking Hyper-V hypercall status.
> > >
> > > Signed-off-by: Joseph Salisbury 
> > > ---
> > [...]
> > > +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 
> > > varhead_size,
> > > +   void *input, void *output)
> > > +{
> > > + u64 control = code;
> > > + u64 status;
> > > + u16 rep_comp;
> > > +
> > > + control |= (u64)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET;
> > > + control |= (u64)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
> > > +
> > > + do {
> > > + status = hv_do_hypercall(control, input, output);
> > > + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS)
> > > + return status;
> > > +
> > > + /* Bits 32-43 of status have 'Reps completed' data. */
> > > + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >>
> > > + HV_HYPERCALL_REP_COMP_OFFSET;
> > > +
> > > + control &= ~HV_HYPERCALL_REP_START_MASK;
> > > + control |= (u64)rep_comp << HV_HYPERCALL_REP_START_OFFSET;
> > > +
> > > + touch_nmi_watchdog();
> > 
> > This seems to be missing in Arm. Does it compile?
> > 
> > Wei.
> 
> touch_nmi_watchdog() is defined as "static inline" in include/linux/nmi.h.  So
> it should be present in all architectures.  It calls arch_touch_nmi_watchdog,
> which is an empty function if CONFIG_HAVE_NMI_WATCHDOG is not defined,
> as is the case on ARM64.

I see. I couldn't find arch_touch_nmi_watchdog on Arm but missed the
stub function. Thanks for the information.

Wei.


Re: [PATCH 1/2] x86/hyperv: Move hv_do_rep_hypercall to asm-generic

2021-04-18 Thread Wei Liu
On Fri, Apr 16, 2021 at 05:43:02PM -0700, Joseph Salisbury wrote:
> From: Joseph Salisbury 
> 
> This patch makes no functional changes.  It simply moves hv_do_rep_hypercall()
> out of arch/x86/include/asm/mshyperv.h and into asm-generic/mshyperv.h
> 
> hv_do_rep_hypercall() is architecture independent, so it makes sense that it
> should be in the architecture independent mshyperv.h, not in the x86-specific
> mshyperv.h.
> 
> This is done in preperation for a follow up patch which creates a consistent
> pattern for checking Hyper-V hypercall status.
> 
> Signed-off-by: Joseph Salisbury 
> ---
[...]
> +static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 
> varhead_size,
> +   void *input, void *output)
> +{
> + u64 control = code;
> + u64 status;
> + u16 rep_comp;
> +
> + control |= (u64)varhead_size << HV_HYPERCALL_VARHEAD_OFFSET;
> + control |= (u64)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
> +
> + do {
> + status = hv_do_hypercall(control, input, output);
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS)
> + return status;
> +
> + /* Bits 32-43 of status have 'Reps completed' data. */
> + rep_comp = (status & HV_HYPERCALL_REP_COMP_MASK) >>
> + HV_HYPERCALL_REP_COMP_OFFSET;
> +
> + control &= ~HV_HYPERCALL_REP_START_MASK;
> + control |= (u64)rep_comp << HV_HYPERCALL_REP_START_OFFSET;
> +
> + touch_nmi_watchdog();

This seems to be missing in Arm. Does it compile?

Wei.

> + } while (rep_comp < rep_count);
> +
> + return status;
> +}
>  
>  /* Generate the guest OS identifier as described in the Hyper-V TLFS */
>  static inline  __u64 generate_guest_id(__u64 d_info1, __u64 kernel_version,
> -- 
> 2.17.1
> 


Re: [PATCH v3 0/3] Drivers: hv: vmbus: Introduce CHANNELMSG_MODIFYCHANNEL_RESPONSE

2021-04-18 Thread Wei Liu
On Fri, Apr 16, 2021 at 04:34:46PM +0200, Andrea Parri (Microsoft) wrote:
> Changes since v2[1]:
>   - fix VMbus protocol version name
>   - add Reviewed-by: tag
>   - refactor/simplyfy changes in hv_synic_cleanup()
> 
> Changes since v1[2]:
>   - rebase on hyperv-next
>   - split changes into three patches
>   - fix error handling in send_modifychannel_with_ack()
>   - remove rescind checks in send_modifychannel_with_ack()
>   - remove unneeded test in hv_synic_event_pending()
>   - add/amend inline comments
>   - style changes
> 
> [1] https://lkml.kernel.org/r/20210414150118.2843-1-parri.and...@gmail.com
> [2] https://lkml.kernel.org/r/20201126191210.13115-1-parri.and...@gmail.com
> 
> Andrea Parri (Microsoft) (3):
>   Drivers: hv: vmbus: Introduce and negotiate VMBus protocol version 5.3
>   Drivers: hv: vmbus: Drivers: hv: vmbus: Introduce
> CHANNELMSG_MODIFYCHANNEL_RESPONSE
>   Drivers: hv: vmbus: Check for pending channel interrupts before taking
> a CPU offline

Applied to hyperv-next. Thanks.

Wei.


Re: [PATCH] Drivers: hv: vmbus: Use after free in __vmbus_open()

2021-04-16 Thread Wei Liu
On Tue, Apr 13, 2021 at 05:42:21PM +0200, Andrea Parri wrote:
> On Tue, Apr 13, 2021 at 01:50:04PM +0300, Dan Carpenter wrote:
> > The "open_info" variable is added to the _connection.chn_msg_list,
> > but the error handling frees "open_info" without removing it from the
> > list.  This will result in a use after free.  First remove it from the
> > list, and then free it.
> > 
> > Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues")
> > Signed-off-by: Dan Carpenter 
> 
> I had this 'queued' in my list,
> 
> Reviewed-by: Andrea Parri 

Applied. Thanks.


Re: [PATCH RFC 01/22] asm-generic/hyperv: add HV_STATUS_ACCESS_DENIED definition

2021-04-16 Thread Wei Liu
On Thu, Apr 15, 2021 at 05:33:17PM +0200, Vitaly Kuznetsov wrote:
> Wei Liu  writes:
> 
> > On Tue, Apr 13, 2021 at 02:26:09PM +0200, Vitaly Kuznetsov wrote:
> >> From TLFSv6.0b, this status means: "The caller did not possess sufficient
> >> access rights to perform the requested operation."
> >> 
> >> Signed-off-by: Vitaly Kuznetsov 
> >
> > This can be applied to hyperv-next right away. Let me know what you
> > think.
> >
> 
> In case there's no immediate need for this constant outside of KVM, I'd
> suggest you just give Paolo your 'Acked-by' so I can carry the patch in
> the series for the time being. This will eliminate the need to track
> dependencies between hyperv-next and kvm-next.

Acked-by: Wei Liu 


Re: [PATCH RFC 01/22] asm-generic/hyperv: add HV_STATUS_ACCESS_DENIED definition

2021-04-15 Thread Wei Liu
On Tue, Apr 13, 2021 at 02:26:09PM +0200, Vitaly Kuznetsov wrote:
> From TLFSv6.0b, this status means: "The caller did not possess sufficient
> access rights to perform the requested operation."
> 
> Signed-off-by: Vitaly Kuznetsov 

This can be applied to hyperv-next right away. Let me know what you
think.

Wei.

> ---
>  include/asm-generic/hyperv-tlfs.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 83448e837ded..e01a3bade13a 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -187,6 +187,7 @@ enum HV_GENERIC_SET_FORMAT {
>  #define HV_STATUS_INVALID_HYPERCALL_INPUT3
>  #define HV_STATUS_INVALID_ALIGNMENT  4
>  #define HV_STATUS_INVALID_PARAMETER  5
> +#define HV_STATUS_ACCESS_DENIED  6
>  #define HV_STATUS_OPERATION_DENIED   8
>  #define HV_STATUS_INSUFFICIENT_MEMORY11
>  #define HV_STATUS_INVALID_PORT_ID17
> -- 
> 2.30.2
> 


Re: [PATCH] Drivers: hv: vmbus: remove unused function

2021-04-14 Thread Wei Liu
On Wed, Apr 14, 2021 at 02:48:17PM +, Michael Kelley wrote:
> From: Jiapeng Chong  Sent: Tuesday, April 
> 13, 2021 11:21 PM
[...]
> This function became unused as of commit 4226ff69a3df 
> ("vmbus: simplify hv_ringbuffer_read") on 7/17/2017.
> 
> Reviewed-by: Michael Kelley 

Applied to hyperv-next. Thanks.

Wei.


Re: [PATCH] Drivers: hv: vmbus: Use after free in __vmbus_open()

2021-04-13 Thread Wei Liu
On Tue, Apr 13, 2021 at 01:50:04PM +0300, Dan Carpenter wrote:
> The "open_info" variable is added to the _connection.chn_msg_list,
> but the error handling frees "open_info" without removing it from the
> list.  This will result in a use after free.  First remove it from the
> list, and then free it.
> 
> Fixes: 6f3d791f3006 ("Drivers: hv: vmbus: Fix rescind handling issues")
> Signed-off-by: Dan Carpenter 
> ---
> From static analysis.  Untested etc.  There is almost certainly a good
> reason to add it to the list before checking "newchannel->rescind" but I
> don't know the code well enough to know what the reason is.
> 

AIUI the channel management code requires the message be queued before
posting the message to backend, because processing response is done in
another thread, and might happen before this message is added to the
list if the order is reversed.

>  drivers/hv/channel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index db30be8f9cce..1c5a418c1962 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -653,7 +653,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  
>   if (newchannel->rescind) {
>   err = -ENODEV;
> - goto error_free_info;
> + goto error_clean_msglist;

Looking at similar functions in the same file I think there is indeed an
UAF problem in the original code.

I have not studied this piece of code extensively so I will wait for
others to chime in.

Wei.

>   }
>  
>   err = vmbus_post_msg(open_msg,
> -- 
> 2.30.2
> 


Re: [PATCH] Drivers: hv: vmbus: remove unused including

2021-04-13 Thread Wei Liu
On Tue, Apr 13, 2021 at 05:49:18PM +0800, Yang Li wrote:
> Fix the following versioncheck warning:
> ./drivers/hv/hv.c: 16 linux/version.h not needed.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 

Thanks for the patch. This has also been reported by Huawei's kernel bot
and fixed in hyperv-next.

Wei.


Re: [PATCH v2 4/4] KVM: hyper-v: Advertise support for fast XMM hypercalls

2021-04-12 Thread Wei Liu
On Mon, Apr 12, 2021 at 07:00:17PM +0200, Siddharth Chandrasekaran wrote:
> Now that all extant hypercalls that can use XMM registers (based on
> spec) for input/outputs are patched to support them, we can start
> advertising this feature to guests.
> 
> Cc: Alexander Graf 
> Cc: Evgeny Iakovlev 
> Signed-off-by: Siddharth Chandrasekaran 
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 7 ++-
>  arch/x86/kvm/hyperv.c  | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index e6cd3fee562b..716f12be411e 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -52,7 +52,7 @@
>   * Support for passing hypercall input parameter block via XMM
>   * registers is available
>   */
> -#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4)
> +#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
>  /* Support for a virtual guest idle state is available */
>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLEBIT(5)
>  /* Frequency MSRs available */
> @@ -61,6 +61,11 @@
>  #define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE BIT(10)
>  /* Support for debug MSRs available */
>  #define HV_FEATURE_DEBUG_MSRS_AVAILABLE  BIT(11)
> +/*
> + * Support for returning hypercall ouput block via XMM

"output"

Wei.


Re: [PATCH v2 3/4] KVM: x86: kvm_hv_flush_tlb use inputs from XMM registers

2021-04-12 Thread Wei Liu
On Mon, Apr 12, 2021 at 07:00:16PM +0200, Siddharth Chandrasekaran wrote:
> +
> +static inline void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc)

Do you really need inline here? The compiler should be smart enough to
inline this function if necessary.

> +{
> + int reg;
> +
> + kvm_fpu_get();
> + for (reg = 0; reg < KVM_HV_HYPERCALL_MAX_XMM_REGISTERS; reg++)
> + _kvm_read_sse_reg(reg, >xmm[reg]);
> + kvm_fpu_put();
> + hc->xmm_dirty = false;

There is no code that sets xmm_dirty to true? What am I missing? I guess
that's because you haven't implemented the hypercalls that need writing
back to guest?

Wei.


Re: [PATCH 0/4] Add support for XMM fast hypercalls

2021-04-08 Thread Wei Liu
On Thu, Apr 08, 2021 at 05:54:43PM +0200, Siddharth Chandrasekaran wrote:
> On Thu, Apr 08, 2021 at 05:48:19PM +0200, Paolo Bonzini wrote:
> > On 08/04/21 17:40, Siddharth Chandrasekaran wrote:
> > > > > > Although the Hyper-v TLFS mentions that a guest cannot use this 
> > > > > > feature
> > > > > > unless the hypervisor advertises support for it, some hypercalls 
> > > > > > which
> > > > > > we plan on upstreaming in future uses them anyway.
> > > > > No, please don't do this. Check the feature bit(s) before you issue
> > > > > hypercalls which rely on the extended interface.
> > > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being
> > > > buggy and using XMM arguments unconditionally.
> > > The guest is at fault here as it expects Hyper-V to consume arguments
> > > from XMM registers for certain hypercalls (that we are working) even if
> > > we didn't expose the feature via CPUID bits.
> >
> > What guest is that?
> 
> It is a Windows Server 2016.

Can you be more specific? Are you implementing some hypercalls from
TLFS? If so, which ones?

Wei.


Re: [PATCH 4/4] KVM: hyper-v: Advertise support for fast XMM hypercalls

2021-04-08 Thread Wei Liu
On Thu, Apr 08, 2021 at 04:20:54PM +0200, Siddharth Chandrasekaran wrote:
> On Thu, Apr 08, 2021 at 02:05:53PM +0200, Vitaly Kuznetsov wrote:
> > Siddharth Chandrasekaran  writes:
> >
> > > Now that all extant hypercalls that can use XMM registers (based on
> > > spec) for input/outputs are patched to support them, we can start
> > > advertising this feature to guests.
> > >
> > > Cc: Alexander Graf 
> > > Cc: Evgeny Iakovlev 
> > > Signed-off-by: Siddharth Chandrasekaran 
> > > ---
> > >  arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
> > >  arch/x86/kvm/hyperv.c  | 1 +
> > >  2 files changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> > > b/arch/x86/include/asm/hyperv-tlfs.h
> > > index e6cd3fee562b..1f160ef60509 100644
> > > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > > @@ -49,10 +49,10 @@
> > >  /* Support for physical CPU dynamic partitioning events is available*/
> > >  #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLEBIT(3)
> > >  /*
> > > - * Support for passing hypercall input parameter block via XMM
> > > + * Support for passing hypercall input and output parameter block via XMM
> > >   * registers is available
> > >   */
> > > -#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4)
> > > +#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) | 
> > > BIT(15)
> >
> > TLFS 6.0b states that there are two distinct bits for input and output:
> >
> > CPUID Leaf 0x4003.EDX:
> > Bit 4: support for passing hypercall input via XMM registers is available.
> > Bit 15: support for returning hypercall output via XMM registers is 
> > available.
> >
> > and HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE is not currently used
> > anywhere, I'd suggest we just rename
> >
> > HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE to 
> > HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE
> > and add HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (bit 15).
> 
> That is how I had it initially; but then noticed that we would never
> need to use either of them separately. So it seemed like a reasonable
> abstraction to put them together.
> 

They are two separate things in TLFS. Please use two macros here.

Wei.


Re: [PATCH 0/4] Add support for XMM fast hypercalls

2021-04-08 Thread Wei Liu
On Thu, Apr 08, 2021 at 05:30:26PM +0200, Paolo Bonzini wrote:
> On 08/04/21 17:28, Wei Liu wrote:
> > > Although the Hyper-v TLFS mentions that a guest cannot use this feature
> > > unless the hypervisor advertises support for it, some hypercalls which
> > > we plan on upstreaming in future uses them anyway.
> > 
> > No, please don't do this. Check the feature bit(s) before you issue
> > hypercalls which rely on the extended interface.
> 
> Perhaps Siddharth should clarify this, but I read it as Hyper-V being buggy
> and using XMM arguments unconditionally.
> 

There is no code in upstream Linux that uses the XMM fast hypercall
interface at the moment.

If there is such code, it has bugs in it and should be fixed. :-)

Wei.

> Paolo
> 


Re: [PATCH 0/4] Add support for XMM fast hypercalls

2021-04-08 Thread Wei Liu
On Wed, Apr 07, 2021 at 11:29:26PM +0200, Siddharth Chandrasekaran wrote:
> Hyper-V supports the use of XMM registers to perform fast hypercalls.
> This allows guests to take advantage of the improved performance of the
> fast hypercall interface even though a hypercall may require more than
> (the current maximum of) two general purpose registers.
> 
> The XMM fast hypercall interface uses an additional six XMM registers
> (XMM0 to XMM5) to allow the caller to pass an input parameter block of
> up to 112 bytes. Hyper-V can also return data back to the guest in the
> remaining XMM registers that are not used by the current hypercall.
> 
> Although the Hyper-v TLFS mentions that a guest cannot use this feature
> unless the hypervisor advertises support for it, some hypercalls which
> we plan on upstreaming in future uses them anyway. 

No, please don't do this. Check the feature bit(s) before you issue
hypercalls which rely on the extended interface.

Wei.


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Wei Liu
On Wed, Apr 07, 2021 at 02:34:01PM +, Haiyang Zhang wrote:
> 
> 
> > -Original Message-
> > From: Wei Liu 
> > Sent: Wednesday, April 7, 2021 9:17 AM
> > To: Dexuan Cui 
> > Cc: da...@davemloft.net; k...@kernel.org; KY Srinivasan
> > ; Haiyang Zhang ; Stephen
> > Hemminger ; wei@kernel.org; Wei Liu
> > ; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-hyp...@vger.kernel.org
> > Subject: Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure
> > Network Adapter (MANA)
> > 
> > On Tue, Apr 06, 2021 at 04:23:21PM -0700, Dexuan Cui wrote:
> > [...]
> > > +config MICROSOFT_MANA
> > > + tristate "Microsoft Azure Network Adapter (MANA) support"
> > > + default m
> > > + depends on PCI_MSI
> > > + select PCI_HYPERV
> > 
> > OOI which part of the code requires PCI_HYPERV?
> > 
> > Asking because I can't immediately find code that looks to be Hyper-V
> > specific (searching for vmbus etc). This device looks like any other PCI 
> > devices
> > to me.
> 
> It depends on the VF nic's PCI config space which is presented by the 
> pci_hyperv driver.

I think all it matters is the PCI bus is able to handle the
configuration space access, right? Assuming there is an emulated PCI
root complex which exposes the config space to the driver, will this
driver still work?

I'm trying to understand how tightly coupled with Hyper-V PCI this
driver is. In an alternative universe, Microsft may suddenly decide to
sell this hardware and someone wants to passthrough an VF via VFIO. I
don't see how this driver wouldn't work, hence the original question.

There is no need to change the code. I'm just curious about a tiny
detail in the implementation.

Wei.

> 
> Thanks,
> - Haiyang


Re: [RFC PATCH 04/18] virt/mshv: request version ioctl

2021-04-07 Thread Wei Liu
On Wed, Apr 07, 2021 at 04:02:56PM +0200, Vitaly Kuznetsov wrote:
> Wei Liu  writes:
> 
> > On Wed, Apr 07, 2021 at 09:38:21AM +0200, Vitaly Kuznetsov wrote:
> >
> >> One more though: it is probably a good idea to introduce selftests for
> >> /dev/mshv (similar to KVM's selftests in
> >> /tools/testing/selftests/kvm). Selftests don't really need a stable ABI
> >> as they live in the same linux.git and can be updated in the same patch
> >> series which changes /dev/mshv behavior. Selftests are very useful for
> >> checking there are no regressions, especially in the situation when
> >> there's no publicly available userspace for /dev/mshv.
> >
> > I think this can wait until we merge the first implementation in tree.
> > There are still a lot of moving parts. Our (currently limited) internal
> > test cases need more cleaning up before they are ready. I certainly
> > don't want to distract Nuno from getting the foundation right.
> >
> 
> I'm absolutely fine with this approach, selftests are a nice add-on, not
> a requirement for the initial implementation. Also, to make them more
> useful to mere mortals, a doc on how to run Linux as root Hyper-V
> partition would come handy)

Making this system easier for others to use and consume is on our radar.
Currently you need Windows bootloader and a not-yet-released loader to
load the hypervisor. We're making progress in bringing in GRUB.

Needless to say there are technical and non-technical challenges for
this work, so don't expect it to happen very soon. :-)

Wei.

> 
> -- 
> Vitaly
> 


Re: [RFC PATCH 04/18] virt/mshv: request version ioctl

2021-04-07 Thread Wei Liu
On Wed, Apr 07, 2021 at 09:38:21AM +0200, Vitaly Kuznetsov wrote:
> Nuno Das Neves  writes:
> 
> > On 3/5/2021 1:18 AM, Vitaly Kuznetsov wrote:
> >> Nuno Das Neves  writes:
> >> 
> >>> On 2/9/2021 5:11 AM, Vitaly Kuznetsov wrote:
>  Nuno Das Neves  writes:
> 
> >> ...
> > +
> > +3.1 MSHV_REQUEST_VERSION
> > +
> > +:Type: /dev/mshv ioctl
> > +:Parameters: pointer to a u32
> > +:Returns: 0 on success
> > +
> > +Before issuing any other ioctls, a MSHV_REQUEST_VERSION ioctl must be 
> > called to
> > +establish the interface version with the kernel module.
> > +
> > +The caller should pass the MSHV_VERSION as an argument.
> > +
> > +The kernel module will check which interface versions it supports and 
> > return 0
> > +if one of them matches.
> > +
> > +This /dev/mshv file descriptor will remain 'locked' to that version as 
> > long as
> > +it is open - this ioctl can only be called once per open.
> > +
> 
>  KVM used to have KVM_GET_API_VERSION too but this turned out to be not
>  very convenient so we use capabilities 
>  (KVM_CHECK_EXTENSION/KVM_ENABLE_CAP)
>  instead.
> 
> >>>
> >>> The goal of MSHV_REQUEST_VERSION is to support changes to APIs in the 
> >>> core set.
> >>> When we add new features/ioctls beyond the core we can use an 
> >>> extension/capability
> >>> approach like KVM.
> >>>
> >> 
> >> Driver versions is a very bad idea from distribution/stable kernel point
> >> of view as it presumes that the history is linear. It is not.
> >> 
> >> Imagine you have the following history upstream:
> >> 
> >> MSHV_REQUEST_VERSION = 1
> >> <100 commits with features/fixes>
> >> MSHV_REQUEST_VERSION = 2
> >> 
> >> MSHV_REQUEST_VERSION = 2
> >> 
> >> Now I'm a linux distribution / stable kernel maintainer. My kernel is at
> >> MSHV_REQUEST_VERSION = 1. Now I want to backport 1 feature from between
> >> VER=1 and VER=2 and another feature from between VER=2 and VER=3. My
> >> history now looks like
> >> 
> >> MSHV_REQUEST_VERSION = 1
> >> <5 commits from between VER=1 and VER=2>
> >>Which version should I declare here 
> >> <5 commits from between VER=2 and VER=3>
> >>Which version should I declare here 
> >> 
> >> If I keep VER=1 then userspace will think that I don't have any extra
> >> features added and just won't use them. If I change VER to 2/3, it'll
> >> think I have *all* features from between these versions.
> >> 
> >> The only reasonable way to manage this is to attach a "capability" to
> >> every ABI change and expose this capability *in the same commit which
> >> introduces the change to the ABI*. This way userspace will now exactly
> >> which ioctls are available and what are their interfaces.
> >> 
> >> Also, trying to define "core set" is hard but you don't really need
> >> to.
> >> 
> >
> > We've had some internal discussion on this.
> >
> > There is bound to be some iteration before this ABI is stable, since even 
> > the
> > underlying Microsoft hypervisor interfaces aren't stable just yet.
> >
> > It might make more sense to just have an IOCTL to check if the API is 
> > stable yet.
> > This would be analogous to checking if kVM_GET_API_VERSION returns 12.
> >
> > How does this sound as a proposal?
> > An MSHV_CHECK_EXTENSION ioctl to query extensions to the core /dev/mshv API.
> >
> > It takes a single argument, an integer named MSHV_CAP_* corresponding to
> > the extension to check the existence of.
> >
> > The ioctl will return 0 if the extension is unsupported, or a positive 
> > integer
> > if supported.
> >
> > We can initially include a capability called MSHV_CAP_CORE_API_STABLE.
> > If supported, the core APIs are stable.
> 
> This sounds reasonable, I'd suggest you reserve MSHV_CAP_CORE_API_STABLE
> right away but don't expose it yet so it's clear the API is not yet
> stable. Test userspace you have may always assume it's running with the
> latest kernel.
> 
> Also, please be clear about the fact that /dev/mshv doesn't
> provide a stable API yet so nobody builds an application on top of
> it.
> 

Very good discussion and suggestions. Thank you Vitaly.

> One more though: it is probably a good idea to introduce selftests for
> /dev/mshv (similar to KVM's selftests in
> /tools/testing/selftests/kvm). Selftests don't really need a stable ABI
> as they live in the same linux.git and can be updated in the same patch
> series which changes /dev/mshv behavior. Selftests are very useful for
> checking there are no regressions, especially in the situation when
> there's no publicly available userspace for /dev/mshv.

I think this can wait until we merge the first implementation in tree.
There are still a lot of moving parts. Our (currently limited) internal
test cases need more cleaning up before they are ready. I certainly
don't want to distract Nuno from getting the foundation right.

Wei.


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Wei Liu
On Tue, Apr 06, 2021 at 04:23:21PM -0700, Dexuan Cui wrote:
[...]
> +config MICROSOFT_MANA
> + tristate "Microsoft Azure Network Adapter (MANA) support"
> + default m
> + depends on PCI_MSI
> + select PCI_HYPERV

OOI which part of the code requires PCI_HYPERV?

Asking because I can't immediately find code that looks to be Hyper-V
specific (searching for vmbus etc). This device looks like any
other PCI devices to me.

Wei.


Re: [PATCH net-next] net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)

2021-04-07 Thread Wei Liu
On Wed, Apr 07, 2021 at 08:08:59AM +, Dexuan Cui wrote:
> > From: kernel test robot 
> > Sent: Tuesday, April 6, 2021 6:31 PM
> > ...
> > Hi Dexuan, 
> > I love your patch! Perhaps something to improve:
> > 
> > All warnings (new ones prefixed by >>):
> > 
> >drivers/pci/controller/pci-hyperv.c: In function 'hv_irq_unmask':
> >drivers/pci/controller/pci-hyperv.c:1220:2: error: implicit declaration 
> > of
> > function 'hv_set_msi_entry_from_desc'
> > [-Werror=implicit-function-declaration]
> > 1220 |  hv_set_msi_entry_from_desc(>int_entry.msi_entry,
> > msi_desc);
> 
> This build error looks strange, because the patch doesn't even touch the 
> driver
> drivers/pci/controller/pci-hyperv.c.
> 

I think this is normal. The bot doesn't restrict itself to the changed
code from my experience.

Wei.


Re: [PATCH] xen/evtchn: Change irq_info lock to raw_spinlock_t

2021-04-07 Thread Wei Liu
On Tue, Apr 06, 2021 at 11:51:04AM +0100, Luca Fancellu wrote:
> Unmask operation must be called with interrupt disabled,
> on preempt_rt spin_lock_irqsave/spin_unlock_irqrestore
> don't disable/enable interrupts, so use raw_* implementation
> and change lock variable in struct irq_info from spinlock_t
> to raw_spinlock_t
> 
> Cc: sta...@vger.kernel.org
> Fixes: 25da4618af24 ("xen/events: don't unmask an event channel
> when an eoi is pending")
> 
> Signed-off-by: Luca Fancellu 

Reviewed-by: Wei Liu 


Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-06 Thread Wei Liu
On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt to start cleaning it up by splitting out panic and
> oops helpers.
> 
> At the same time convert users in header and lib folder to use new header.
> Though for time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.
> 
> Signed-off-by: Andy Shevchenko 

Acked-by: Wei Liu 


Re: [PATCH] x86/hyperv: remove unused including

2021-04-06 Thread Wei Liu
On Tue, Apr 06, 2021 at 09:56:35AM +0800, Yang Li wrote:
> Fix the following versioncheck warning:
> ./arch/x86/hyperv/hv_proc.c: 3 linux/version.h not needed.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 

Thanks for the patch. This is already reported and fixed in hyperv-next
by Huawei's Hulk bot.

Wei.

> ---
>  arch/x86/hyperv/hv_proc.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
> index 60461e5..27e17ad 100644
> --- a/arch/x86/hyperv/hv_proc.c
> +++ b/arch/x86/hyperv/hv_proc.c
> @@ -1,6 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> -- 
> 1.8.3.1
> 


[GIT PULL] Hyper-V fixes for 5.12-rc6

2021-04-02 Thread Wei Liu
Hi Linus,

The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8:

  Linux 5.12-rc1 (2021-02-28 16:05:19 -0800)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git 
tags/hyperv-fixes-signed-20210402

for you to fetch changes up to 37df9f3fedb6aeaff5564145e8162aab912c9284:

  video: hyperv_fb: Fix a double free in hvfb_probe (2021-03-25 13:31:20 +)


hyperv-fixes for 5.12-rc6
  - One patch from Lu Yunlong to fix a double free in hvfb_probe


Lv Yunlong (1):
  video: hyperv_fb: Fix a double free in hvfb_probe

 drivers/video/fbdev/hyperv_fb.c | 3 ---
 1 file changed, 3 deletions(-)


Re: [PATCH v2] video: hyperv_fb: Fix a double free in hvfb_probe

2021-03-25 Thread Wei Liu
On Wed, Mar 24, 2021 at 01:46:39PM +, Michael Kelley wrote:
> From: Lv Yunlong  Sent: Wednesday, March 24, 2021 
> 3:37 AM
> > 
> > In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info)
> > and return err when info->apertures is freed.
> > 
> > In the error1 label of hvfb_probe, info->apertures will be freed for the
> > second time in framebuffer_release(info).
> > 
> > My patch removes all kfree(info->apertures) instead of set info->apertures
> > to NULL. It is because that let framebuffer_release() handle freeing the
> > memory flows the fbdev pattern, and less code overall.
> 
> Let me suggest some clarifications in the commit message.  It's probably
> better not to reference the initial approach of setting info->apertures to
> NULL, since there won't be any record of that approach in the commit
> history.  Here's what I would suggest:
> 
> Function hvfb_probe() calls hvfb_getmem(), expecting upon return that
> info->apertures is either NULL or points to memory that should be freed
> by framebuffer_release().  But hvfb_getmem() is freeing the memory and
> leaving the pointer non-NULL, resulting in a double free if an error
> occurs or later if hvfb_remove() is called.
> 
> Fix this by removing all kfree(info->apertures) calls in hvfb_getmem().
> This will allow framebuffer_release() to free the memory, which follows
> the pattern of other fbdev drivers.
> 
> Modulo this revision to the commit message, which Wei Liu can
> probably incorporate,
> 

Yes. I surely can incorporate the changes.

I will also add the Fixes tag.

> Reviewed-by: Michael Kelley 


Re: [PATCH v5] x86/Hyper-V: Support for free page reporting

2021-03-24 Thread Wei Liu
On Tue, Mar 23, 2021 at 06:47:16PM +, Sunil Muthuswamy wrote:
> Linux has support for free page reporting now (36e66c554b5c) for
> virtualized environment. On Hyper-V when virtually backed VMs are
> configured, Hyper-V will advertise cold memory discard capability,
> when supported. This patch adds the support to hook into the free
> page reporting infrastructure and leverage the Hyper-V cold memory
> discard hint hypercall to report/free these pages back to the host.
> 
> Signed-off-by: Sunil Muthuswamy 
> Tested-by: Matheus Castello 

Applied to hyperv-next. Thanks.


Re: [PATCH -next] x86: Fix unused variable 'hi'

2021-03-24 Thread Wei Liu
On Tue, Mar 23, 2021 at 11:32:50AM +, Wei Liu wrote:
> On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote:
> > Fixes the following W=1 kernel build warning(s):
> > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used 
> > [-Wunused-but-set-variable]
> > 
> > Compiled with CONFIG_HYPERV enabled:
> > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
> > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 
> > CROSS_COMPILE=x86_64-linux-gnu-
> > 
> > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, 
> > which means higher
> > 32 bits are not really used, therefore  potentially cast to void in order 
> > to silent this warning.
> > 
> > Reported-by: Hulk Robot 
> > Signed-off-by: Xu Yihang 
> > ---
> >  arch/x86/hyperv/hv_apic.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> > index 284e73661a18..a8b639498033 100644
> > --- a/arch/x86/hyperv/hv_apic.c
> > +++ b/arch/x86/hyperv/hv_apic.c
> > @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg)
> > switch (reg) {
> > case APIC_EOI:
> > rdmsr(HV_X64_MSR_EOI, reg_val, hi);
> > +   (void) hi;
> > return reg_val;
> > case APIC_TASKPRI:
> > rdmsr(HV_X64_MSR_TPR, reg_val, hi);
> > +   (void) hi;
> 
> I would like to remove the space while committing this patch. There is
> no need for you to do anything.

Applied to hyperv-next.


Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning

2021-03-24 Thread Wei Liu
On Tue, Mar 23, 2021 at 10:43:02AM +0800, Xu Yihang wrote:
> Fixes the following W=1 kernel build warning(s):
> arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not 
> used [-Wunused-but-set-variable]
>   unsigned long msr_val;
> 
> As Hypervisor Top-Level Functional Specification states in chapter 7.5 
> Virtual Processor Idle Sleep State, "A partition which possesses the 
> AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into 
> the virtual processor idle sleep state through a read to the 
> hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is 
> necessary, msr_val is not uesed, so potentially cast to void in order to 
> silent this warning.
> 
> Reference:
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Yihang 

Applied to hyperv-next. Thanks.


Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning

2021-03-23 Thread Wei Liu
On Tue, Mar 23, 2021 at 01:13:03AM +0100, Ingo Molnar wrote:
> 
> * Michael Kelley  wrote:
> 
> > From: Ingo Molnar  Sent: Monday, March 22, 2021 
> > 2:08 PM
> > > 
> > > * Xu Yihang  wrote:
> > > 
> > > > Fixes the following W=1 kernel build warning(s):
> > > > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable 'msr_val' set 
> > > > but not used [-
> > > Wunused-but-set-variable]
> > > >   unsigned long msr_val;
> > > >
> > > > As Hypervisor Top-Level Functional Specification states in chapter 7.5 
> > > > Virtual Processor
> > > Idle Sleep State, "A partition which possesses the AccessGuestIdleMsr 
> > > privilege (refer to
> > > section 4.2.2) may trigger entry into the virtual processor idle sleep 
> > > state through a read to
> > > the hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read 
> > > is
> > > necessary, msr_val is not uesed, so __maybe_unused should be added.
> > > >
> > > > Reference:
> > > >
> > > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> > > >
> > > > Reported-by: Hulk Robot 
> > > > Signed-off-by: Xu Yihang 
> > > > ---
> > > >  arch/x86/hyperv/hv_spinlock.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/hyperv/hv_spinlock.c 
> > > > b/arch/x86/hyperv/hv_spinlock.c
> > > > index f3270c1fc48c..67bc15c7752a 100644
> > > > --- a/arch/x86/hyperv/hv_spinlock.c
> > > > +++ b/arch/x86/hyperv/hv_spinlock.c
> > > > @@ -25,7 +25,7 @@ static void hv_qlock_kick(int cpu)
> > > >
> > > >  static void hv_qlock_wait(u8 *byte, u8 val)
> > > >  {
> > > > -   unsigned long msr_val;
> > > > +   unsigned long msr_val __maybe_unused;
> > > > unsigned long flags;
> > > 
> > > Please don't add new __maybe_unused annotations to the x86 tree -
> > > improve the flow instead to help GCC recognize the initialization
> > > sequence better.
> > > 
> > > Thanks,
> > > 
> > >   Ingo
> > 
> > Could you elaborate on the thinking here, or point to some written
> > discussion?   I'm just curious.   In this particular case, it's not a 
> > problem
> > with the flow or gcc detection.  This code really does read an MSR and
> > ignore that value that is read, so it's not clear how gcc would ever
> > figure out that's OK.
> 
> Yeah, so the canonical way to signal that the msr_val isn't used would 
> be to rewrite this as:
> 
> 
>   if (READ_ONCE(*byte) == val) {
>   unsigned long msr_val;
> 
>   rdmsrl(HV_X64_MSR_GUEST_IDLE, msr_val);
> 
>   (void)msr_val;
>   }
> 
> (Also see the patch below - untested.)
> 
> This makes it abundantly clear that the rdmsr() msr_val return value 
> is not 'maybe' unused, but totally intentionally skipped.
> 
> Thanks,
> 
>   Ingo
> 

Thank you for the advice, Ingo.

Wei.


Re: [PATCH -next] x86: Fix unused variable 'hi'

2021-03-23 Thread Wei Liu
On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote:
> Fixes the following W=1 kernel build warning(s):
> arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used 
> [-Wunused-but-set-variable]
> 
> Compiled with CONFIG_HYPERV enabled:
> make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
> make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
> 
> HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, which 
> means higher
> 32 bits are not really used, therefore  potentially cast to void in order to 
> silent this warning.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Yihang 
> ---
>  arch/x86/hyperv/hv_apic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index 284e73661a18..a8b639498033 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg)
>   switch (reg) {
>   case APIC_EOI:
>   rdmsr(HV_X64_MSR_EOI, reg_val, hi);
> + (void) hi;
>   return reg_val;
>   case APIC_TASKPRI:
>   rdmsr(HV_X64_MSR_TPR, reg_val, hi);
> + (void) hi;

I would like to remove the space while committing this patch. There is
no need for you to do anything.

Wei.

>   return reg_val;
>  
>   default:
> -- 
> 2.17.1
> 


Re: [PATCH] video/fbdev: Fix a double free in hvfb_probe

2021-03-23 Thread Wei Liu
Thanks for your patch.

I would like to change the prefix to "video: hyperv_fb:" to be more
specific.

On Tue, Mar 23, 2021 at 12:33:50AM -0700, Lv Yunlong wrote:
> In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info)
> and return err when info->apertures is freed.
> 
> In the error1 label of hvfb_probe, info->apertures will be freed twice
> by framebuffer_release(info).
> 

I would say "freed for the second time" here. What you wrote reads to me
fraembuffer_release frees the buffer twice all by itself.

> My patch sets info->apertures to NULL after it was freed to avoid
> double free.
> 

I think this approach works. I would like to give other people a chance
to comment though.

Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb on 
HyperV Gen 1 VMs.")

> Signed-off-by: Lv Yunlong 
> ---
>  drivers/video/fbdev/hyperv_fb.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
> index c8b0ae676809..2fc9b507e73a 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -1032,6 +1032,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct 
> fb_info *info)
>   if (!pdev) {
>   pr_err("Unable to find PCI Hyper-V video\n");
>   kfree(info->apertures);
> + info->apertures = NULL;
>   return -ENODEV;
>   }
>  
> @@ -1130,6 +1131,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct 
> fb_info *info)
>   pci_dev_put(pdev);
>   }
>   kfree(info->apertures);
> + info->apertures = NULL;
>  
>   return 0;
>  
> @@ -1142,6 +1144,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct 
> fb_info *info)
>   if (!gen2vm)
>   pci_dev_put(pdev);
>   kfree(info->apertures);
> + info->apertures = NULL;
>  
>   return -ENOMEM;
>  }
> -- 
> 2.25.1
> 
> 


Re: [PATCH v4] x86/Hyper-V: Support for free page reporting

2021-03-22 Thread Wei Liu
On Fri, Mar 19, 2021 at 09:30:50PM +, Michael Kelley wrote:
> From: Sunil Muthuswamy   Sent: Friday, March 19, 2021 
> 2:21 PM
> > 
> > > What's the strategy for this flag in the unlikely event that the 
> > > hypercall fails?
> > > It doesn't seem right to have hv_query_ext_cap() fail, but leave the
> > > static flag set to true.  Just move that line down to after the status 
> > > check
> > > has succeeded?
> > 
> > That call should not fail in any normal circumstances. The current idea was 
> > to
> > avoid repeating the same call on persistent failure. 
> 
> OK, I can see that as a valid strategy.  And the assumption is that a failed
> hypercall would leave hv_extended_cap unmodified and hence all zeros.
> 
> I'm OK with this approach if you want to keep it.  But perhaps add a short
> comment about the intent so it doesn't look like a bug. :-)
> 

Sunil, if you can send an updated version of your patch by either
providing a comment or moving the code around, I can queue it up for
hyperv-next.

I think adding a comment is perhaps the easier thing to do.

Wei.


Re: [PATCH] hyperv: Few mundane typo fixes

2021-03-22 Thread Wei Liu
On Sun, Mar 21, 2021 at 04:41:53PM -0700, Randy Dunlap wrote:
> On March 21, 2021 4:31:08 PM PDT, Bhaskar Chowdhury  
> wrote:
> >
> >s/sructure/structure/
> >s/extention/extension/
> >s/offerred/offered/
> >s/adversley/adversely/
> >
> >Signed-off-by: Bhaskar Chowdhury 
> 
> Acked-by: Randy Dunlap 
> 

Queued for hyperv-next. Thanks.

Wei.


Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning

2021-03-22 Thread Wei Liu
On Mon, Mar 22, 2021 at 11:17:13AM +0800, Xu Yihang wrote:
> Fixes the following W=1 kernel build warning(s):
> arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not 
> used [-Wunused-but-set-variable]
>   unsigned long msr_val;
> 
> As Hypervisor Top-Level Functional Specification states in chapter 7.5 
> Virtual Processor Idle Sleep State, "A partition which possesses the 
> AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into 
> the virtual processor idle sleep state through a read to the 
> hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is 
> necessary, msr_val is not uesed, so __maybe_unused should be added.
> 
> Reference:
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Yihang 

I modified the commit message a bit and queued this up for hyperv-next.
Thanks.

Wei.


Re: [PATCH -next] x86: Fix unused variable 'hi'

2021-03-22 Thread Wei Liu
On Mon, Mar 22, 2021 at 11:54:26AM +0800, Xu Yihang wrote:
> Fixes the following W=1 kernel build warning(s):
> arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used 
> [-Wunused-but-set-variable]
> 
> Compiled with CONFIG_HYPERV enabled:
> make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
> make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
> 
> HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, which 
> means higher 32 bits are not really used, therefore __maybe_unused added.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Yihang 

I slightly modified the commit message and queued it up for hyperv-next.
Thanks.

Wei.


Re: [PATCH v2] drivers: hv: Fix EXPORT_SYMBOL and tab spaces issue

2021-03-22 Thread Wei Liu
On Wed, Mar 10, 2021 at 10:51:55AM +0530, Vasanth wrote:
> 1.Fixed EXPORT_SYMBOL should be follow immediately function/variable.
> 2.Fixed code tab spaces issue.
> 
> Signed-off-by: Vasanth M 

Applied to hyperv-next.


Re: linux-next: manual merge of the hyperv tree with the tip tree

2021-03-16 Thread Wei Liu
On Tue, Mar 16, 2021 at 04:02:54PM +0100, Borislav Petkov wrote:
> On Mon, Mar 15, 2021 at 02:35:05PM +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the hyperv tree got a conflict in:
> > 
> >   arch/x86/include/asm/mshyperv.h
> > 
> > between commit:
> > 
> >   a0e2bf7cb700 ("x86/paravirt: Switch time pvops functions to use 
> > static_call()")
> > 
> > from the tip tree and commit:
> > 
> >   eb3e1d370b4c ("clocksource/drivers/hyper-v: Handle sched_clock 
> > differences inline")
> > 
> > from the hyperv tree.
> > 
> > I fixed it up (I used the latter version of this file and then applied the
> > following patch) and can carry the fix as necessary. This is now fixed
> > as far as linux-next is concerned, but any non trivial conflicts should
> > be mentioned to your upstream maintainer when your tree is submitted for
> > merging.  You may also want to consider cooperating with the maintainer
> > of the conflicting tree to minimise any particularly complex conflicts.
> 
> Right,
> 
> so tglx and I took a quick look and came to the conclusion that it would
> be best if you - provided it is not too much trouble - keep applying
> this patch so that linux-next can get tested properly and we - Wei or I
> - explain this merge conflict in our pull requests during the next merge
> window and ask Linus to merge your patch ontop. This way we'll save us
> the cross-tree merging dance.

Totally agreed. :-)

I've made a note to inform Linus about this in the next merge window.

Thanks,
Wei.

> 
> Thx!
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG 
> Nürnberg


Re: [PATCH 1/1] asm-generic/hyperv: Add missing function prototypes per -W1 warnings

2021-03-11 Thread Wei Liu
On Wed, Mar 10, 2021 at 10:47:49AM -0800, Michael Kelley wrote:
> Add two function prototypes for -W1 warnings generated by the
> kernel test robot.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Michael Kelley 

Applied to hyperv-next


Re: [patch 12/14] PCI: hv: Use tasklet_disable_in_atomic()

2021-03-10 Thread Wei Liu
On Tue, Mar 09, 2021 at 09:42:15AM +0100, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior 
> 
> The hv_compose_msi_msg() callback in irq_chip::irq_compose_msi_msg is
> invoked via irq_chip_compose_msi_msg(), which itself is always invoked from
> atomic contexts from the guts of the interrupt core code.
> 
> There is no way to change this w/o rewriting the whole driver, so use
> tasklet_disable_in_atomic() which allows to make tasklet_disable()
> sleepable once the remaining atomic users are addressed.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> Signed-off-by: Thomas Gleixner 

Acked-by: Wei Liu 


Re: [PATCH v3 00/10] Refactor arch specific Hyper-V code

2021-03-08 Thread Wei Liu
On Tue, Mar 02, 2021 at 01:38:12PM -0800, Michael Kelley wrote:
[...]
> Michael Kelley (10):
>   Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code
>   x86/hyper-v: Move hv_message_type to architecture neutral module
>   Drivers: hv: Redo Hyper-V synthetic MSR get/set functions
>   Drivers: hv: vmbus: Move hyperv_report_panic_msg to arch neutral code
>   Drivers: hv: vmbus: Handle auto EOI quirk inline
>   Drivers: hv: vmbus: Move handling of VMbus interrupts
>   clocksource/drivers/hyper-v: Handle vDSO differences inline
>   clocksource/drivers/hyper-v: Handle sched_clock differences inline
>   clocksource/drivers/hyper-v: Set clocksource rating based on Hyper-V
> feature
>   clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts

Applied to hyperv-next.

Wei.


Re: [PATCH] drivers: hv: Fix no spaces issue

2021-03-08 Thread Wei Liu


On Fri, Mar 05, 2021 at 09:34:49PM +0530, Vasanth wrote:
> Fixed code spaces issue.
> 
> Signed-off-by: Vasanth M 

What do you mean by "Fixed no spaces issue" in the title?

I can see you deleted an empty line and changed some tabs to spaces.
Please be specific in the commit message.

Wei.


Re: [PATCH] Drivers: hv: vmbus: Drop error message when 'No request id available'

2021-03-03 Thread Wei Liu
On Mon, Mar 01, 2021 at 08:13:48PM +0100, Andrea Parri (Microsoft) wrote:
> Running out of request IDs on a channel essentially produces the same
> effect as running out of space in the ring buffer, in that -EAGAIN is
> returned.  The error message in hv_ringbuffer_write() should either be
> dropped (since we don't output a message when the ring buffer is full)
> or be made conditional/debug-only.
> 
> Suggested-by: Michael Kelley 
> Signed-off-by: Andrea Parri (Microsoft) 
> Fixes: e8b7db38449ac ("Drivers: hv: vmbus: Add vmbus_requestor data structure 
> for VMBus hardening")

Applied to hyperv-fixes.

Wei.


Re: [PATCH v3] drivers: hv: Fix whitespace errors

2021-03-02 Thread Wei Liu
On Fri, Feb 19, 2021 at 05:30:36PM +, Michael Kelley wrote:
> From: Vasanth  Sent: Friday, February 19, 2021 9:13 AM
> > To: KY Srinivasan 
> > Cc: Haiyang Zhang ; Stephen Hemminger
> > ; wei@kernel.org; linux-hyp...@vger.kernel.org; 
> > linux-
> > ker...@vger.kernel.org; Vasanth 
> > Subject: [PATCH v3] drivers: hv: Fix whitespace errors
> > 
> > Fixed checkpatch warning and errors on hv driver.
> > 
> > Signed-off-by: Vasanth 

Vasanth, normally people put their full name in the SoB. Do you want to
do that too?

There is no need to resend. Just let me know what you think.


> > ---
> 
> Reviewed-by: Michael Kelley 

Thanks Michael.

I will pick up this patch within this week.

Wei.


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-01 Thread Si-Wei Liu




On 2/28/2021 1:27 PM, Michael S. Tsirkin wrote:

On Thu, Feb 25, 2021 at 04:56:42PM -0800, Si-Wei Liu wrote:

Hi Michael,

Are you okay to live without this ioctl for now? I think QEMU is the one
that needs to be fixed and will have to be made legacy guest aware. I think
the kernel can just honor the feature negotiation result done by QEMU and do
as what's told to.Will you agree?

If it's fine, I would proceed to reverting commit fe36cbe067 and related
code in question from the kernel.

Thanks,
-Siwei


Not really, I don't see why that's a good idea.  fe36cbe067 is the code
checking MTU before FEATURES_OK. Spec explicitly allows that.


Alright, but what I meant was this commit
452639a64ad8 ("vdpa: make sure set_features is invoked for legacy").

But I got why you need it in another email (for BE host/guest).

-Siwei


Re: [PATCH v2 00/10] Refactor arch specific Hyper-V code

2021-03-01 Thread Wei Liu
On Sun, Feb 28, 2021 at 05:15:22PM -0800, Michael Kelley wrote:
> To support Linux guests on Hyper-V on multiple architectures, the original
> approach factored out all differences between Hyper-V on x86/x64 and
> Hyper-V on ARM64 into functions or #defines under arch/x86 and
> arch/arm64. Some of these differences are truly related to the
> architecture, but others are more properly treated as Linux OS
> differences or just quirks in Hyper-V. Feedback from Arnd Bergmann[1]
> recommended that differences other than architecture should be
> incorporated into the architecture independent Hyper-V code. Each
> difference can be handled with conditions specific to the difference
> instead of tying it to the broader x86/x64 vs. ARM64. This approach
> reduces the amount of code under arch/x86 and arch/arm64 and keeps
> the non-architectural differences localized and more easily understood.
> 
> This patch set implements the new approach by changing the interface
> between the architecture independent code and the architecture dependent
> code for x86/x64. The patches move code from arch/x86 to the
> architecture independent Hyper-V code whenever possible, and add
> architecture independent support needed by other architectures like
> ARM64.  No functionality is changed for x86/x64.  A subsequent patch
> set will provide the Hyper-V support code under arch/arm64.
> 
> This patch set results in an increase in lines of code (though some
> of the increase is additional comments).  But the lines needed under
> arch/arm64 in the upcoming patch set is significantly reduced, resulting
> in a net decrease of about 125 lines.
> 
> [1] 
> https://lore.kernel.org/lkml/CAK8P3a1hDBVembCd+6=ENUWYFz=72jbtfmrkyz2afd+_q04...@mail.gmail.com/

This series looks good to me.

Given this series touches mostly Hyper-V code. I will be taking this it
via hyperv-next once the last two patches are reviewed.

Wei.


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-25 Thread Si-Wei Liu



Hi Michael,

Are you okay to live without this ioctl for now? I think QEMU is the one 
that needs to be fixed and will have to be made legacy guest aware. I 
think the kernel can just honor the feature negotiation result done by 
QEMU and do as what's told to.Will you agree?


If it's fine, I would proceed to reverting commit fe36cbe067 and related 
code in question from the kernel.


Thanks,
-Siwei

On 2/24/2021 10:24 AM, Si-Wei Liu wrote:

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(
Well, although I think adding an ioctl is doable, may I know what the 
use case there will be for kernel to leverage such info directly? Is 
there a case QEMU can't do with dedicate ioctls later if there's 
indeed differentiation (legacy v.s. modern) needed?


One of the reason I asked is if this ioctl becomes a mandate for 
vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
see this ioctl coming?


If it's optional, suppose the kernel may need it only when it becomes 
necessary?


Thanks,
-Siwei




Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Si-Wei Liu




On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:


On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:

On 2021/2/23 9:12 上午, Si-Wei Liu wrote:

On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is
set.
This field specifies the maximum MTU for the driver to use.
"

Do we really want to workaround this?

Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the
device.
4. Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write)
the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set:
otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues
for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration
space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?

It is, but it's not relevant to what this commit tries to address. I
thought the legacy guest still needs to be supported.

Having said, a separate patch has to be posted to fix the guest driver
issue where this discrepancy is introduced to virtnet_validate() (since
commit fe36cbe067). But it's not technically related to this patch.

-Siwei

I think it's a bug to read config space in validate, we should move it to
virtnet_probe().

Thanks

I take it back, reading but not writing seems to be explicitly allowed by spec.
So our way to detect a legacy guest is bogus, need to think what is
the best way to handle this.

Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
guest? Supposedly only config space write access needs to be guarded before
setting FEATURES_OK.

-Siwie

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(
Well, although I think adding an ioctl is doable, may I know what the 
use case there will be for kernel to leverage such info directly? Is 
there a case QEMU can't do with dedicate ioctls later if there's indeed 
differentiation (legacy v.s. modern) needed?


One of the reason I asked is if this ioctl becomes a mandate for 
vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
see this ioctl coming?


If it's optional, suppose the kernel may need it only when it becomes 
necessary?


Thanks,
-Siwei






Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
     1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64
mlx5_vdpa_get_features(struct vdpa_device *vdev)
     return mvdev->mlx_features;
     }
-static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
u64 features)
-{
-    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-    return -EOPNOTSUPP;
-
-    return 0;
-}
-
     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
     {
     int err;
@@ -1558,18 +1550,13 @@ static int
mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
features)
     {
     struct mlx5_vdpa_dev *mv

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-23 Thread Si-Wei Liu




On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:

On 2021/2/23 9:12 上午, Si-Wei Liu wrote:


On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is
set.
This field specifies the maximum MTU for the driver to use.
"

Do we really want to workaround this?

Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the
device.
4. Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write)
the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set:
otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues
for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration
space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?

It is, but it's not relevant to what this commit tries to address. I
thought the legacy guest still needs to be supported.

Having said, a separate patch has to be posted to fix the guest driver
issue where this discrepancy is introduced to virtnet_validate() (since
commit fe36cbe067). But it's not technically related to this patch.

-Siwei


I think it's a bug to read config space in validate, we should move it to
virtnet_probe().

Thanks

I take it back, reading but not writing seems to be explicitly allowed by spec.
So our way to detect a legacy guest is bogus, need to think what is
the best way to handle this.
Then maybe revert commit fe36cbe067 and friends, and have QEMU detect 
legacy guest? Supposedly only config space write access needs to be 
guarded before setting FEATURES_OK.


-Siwie




Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
    drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
    1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64
mlx5_vdpa_get_features(struct vdpa_device *vdev)
    return mvdev->mlx_features;
    }
-static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
u64 features)
-{
-    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-    return -EOPNOTSUPP;
-
-    return 0;
-}
-
    static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
    {
    int err;
@@ -1558,18 +1550,13 @@ static int
mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
features)
    {
    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
    struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-    int err;
    print_features(mvdev, features, true);
-    err = verify_min_features(mvdev, features);
-    if (err)
-    return err;
-
    ndev->mvdev.actual_features = features &
ndev->mvdev.mlx_features;
    ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
    ndev->config.status |= cpu_to_mlx5vdpa16(mvdev,
VIRTIO_NET_S_LINK_UP);
-    return err;
+    return 0;
    }
    static void mlx5_vdpa_set_config_cb(struct vdpa_device
*vdev, struct vdpa_callback *cb)




Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-22 Thread Si-Wei Liu




On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.


This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if VIRTIO_NET_F_MTU is
set.
This field specifies the maximum MTU for the driver to use.
"

Do we really want to workaround this?

Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by 
the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the 
device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits 
after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, 
the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the 
device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and 
population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?
It is, but it's not relevant to what this commit tries to address. I 
thought the legacy guest still needs to be supported.


Having said, a separate patch has to be posted to fix the guest driver 
issue where this discrepancy is introduced to virtnet_validate() (since 
commit fe36cbe067). But it's not technically related to this patch.


-Siwei





Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
   drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
   1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device 
*vdev)
return mvdev->mlx_features;
   }
-static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
-{
-   if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-   return -EOPNOTSUPP;
-
-   return 0;
-}
-
   static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
   {
int err;
@@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device 
*vdev, u64 features)
   {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-   int err;
print_features(mvdev, features, true);
-   err = verify_min_features(mvdev, features);
-   if (err)
-   return err;
-
ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
-   return err;
+   return 0;
   }
   static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct 
vdpa_callback *cb)




Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-22 Thread Si-Wei Liu




On 2/21/2021 8:14 PM, Jason Wang wrote:


On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.



This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if 
VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the 
driver to use.

"

Do we really want to workaround this?


Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?

I think the point is, since there's legacy guest we'd have to support, 
this host side workaround is unavoidable. Although I agree the violating 
driver should be fixed (yes, it's in today's upstream kernel which 
exists for a while now).


-Siwei



Thanks



Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
devices")

Signed-off-by: Si-Wei Liu 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
  1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c

index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct 
vdpa_device *vdev)

  return mvdev->mlx_features;
  }
  -static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 
features)

-{
-    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-    return -EOPNOTSUPP;
-
-    return 0;
-}
-
  static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
  {
  int err;
@@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct 
vdpa_device *vdev, u64 features)

  {
  struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
  struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-    int err;
    print_features(mvdev, features, true);
  -    err = verify_min_features(mvdev, features);
-    if (err)
-    return err;
-
  ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
  ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
  ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, 
VIRTIO_NET_S_LINK_UP);

-    return err;
+    return 0;
  }
    static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, 
struct vdpa_callback *cb)






Re: [PATCH v8 1/6] arm64: hyperv: Add Hyper-V hypercall and register access utilities

2021-02-22 Thread Wei Liu
On Thu, Feb 18, 2021 at 03:16:29PM -0800, Michael Kelley wrote:
> hyperv-tlfs.h defines Hyper-V interfaces from the Hyper-V Top Level
> Functional Spec (TLFS), and #includes the architecture-independent
> part of hyperv-tlfs.h in include/asm-generic.  The published TLFS
> is distinctly oriented to x86/x64, so the ARM64-specific
> hyperv-tlfs.h includes information for ARM64 that is not yet formally
> published. The TLFS is available here:
> 
>   docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> 
> mshyperv.h defines Linux-specific structures and routines for
> interacting with Hyper-V on ARM64, and #includes the architecture-
> independent part of mshyperv.h in include/asm-generic.
> 
> Use these definitions to provide utility functions to make
> Hyper-V hypercalls and to get and set Hyper-V provided
> registers associated with a virtual processor.
> 
> Signed-off-by: Michael Kelley 

Reviewed-by: Wei Liu 


Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-19 Thread Si-Wei Liu




On 2/19/2021 6:38 PM, Jason Wang wrote:



Right now the value is exposed to userspace via GET_VRING_BASE, so 
only last_avail_idx is synced. If we need sync last_used_idx, we 
should also sync pending indices which requires more thoughts.
Technically it doesn't sound right - crossing the boundary a bit even 
with simplified form of assumption. But depending on how userspace 
could make use of this API, it doesn't seem it breaks existing 
functionality for the moment.



I don't get here, maybe you can explain a little bit more? 

Please refer to the email I just sent.

https://lore.kernel.org/lkml/033b0806-4037-5755-a1fa-91dbb58ba...@oracle.com/

-Siwei


Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-19 Thread Si-Wei Liu




On 2/17/2021 11:42 AM, Si-Wei Liu wrote:



On 2/16/2021 8:20 AM, Eli Cohen wrote:

When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.

Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
devices")

Signed-off-by: Eli Cohen 

Acked-by: Si-Wei Liu 


I'd take it back for the moment, according to Jason's latest comment. 
Discussion still going.





---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
  1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c

index b8e9d525d66c..a51b0f86afe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net 
*ndev, struct mlx5_vdpa_virtqueue *m

  return;
  }
  mvq->avail_idx = attr.available_index;
+    mvq->used_idx = attr.used_index;
  }
    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
@@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct 
vdpa_device *vdev, u16 idx,

  return -EINVAL;
  }
  +    mvq->used_idx = state->avail_index;
  mvq->avail_idx = state->avail_index;
This is where things starts getting interesting. According to Jason, the 
original expectation of this API would be to restore the internal 
last_avail_index in the hardware. With Mellanox network vDPA hardware 
implementation, there's no such last_avail_index thing in the hardware 
but only a single last_used_index representing both indices, which 
should always be the same and in sync with each other. So from migration 
point of view, it appears logical to restore the saved last_avail_index 
to the last_used_index in the hardware, right? But what is the point to 
restore mvq->avail_idx?


Actually, this code path is being repurposed to address the index reset 
problem in the device reset scenario. Where the mvq->avail_idx and 
mvq->used_idx are both reset to 0 once device is reset. This is a bit 
crossing the boundary to me.




  return 0;
  }
@@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct 
vdpa_device *vdev, u16 idx, struct vdpa

   * that cares about emulating the index after vq is stopped.
   */
  if (!mvq->initialized) {
-    state->avail_index = mvq->avail_idx;
+    state->avail_index = mvq->used_idx;
This is where the last_used_index gets loaded from the hardware (when 
device is stopped).



  return 0;
  }
  @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct 
vdpa_device *vdev, u16 idx, struct vdpa

  mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
  return err;
  }
-    state->avail_index = attr.available_index;
+    state->avail_index = attr.used_index;

This code path never gets called from userspace (when device is running).

-Siwei


  return 0;
  }
  @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct 
mlx5_vdpa_net *ndev)

  }
  }
  -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
-{
-    int i;
-
-    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-    ndev->vqs[i].avail_idx = 0;
-    ndev->vqs[i].used_idx = 0;
-    }
-}
-
  /* TODO: cross-endian support */
  static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev 
*mvdev)

  {
@@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct 
vdpa_device *vdev, u8 status)

  if (!status) {
  mlx5_vdpa_info(mvdev, "performing device reset\n");
  teardown_driver(ndev);
-    clear_virtqueues(ndev);
  mlx5_vdpa_destroy_mr(>mvdev);
  ndev->mvdev.status = 0;
  ++mvdev->generation;






Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-19 Thread Si-Wei Liu




On 2/18/2021 7:10 PM, Jason Wang wrote:


On 2021/2/18 8:43 下午, Si-Wei Liu wrote:



On 2/17/2021 8:44 PM, Jason Wang wrote:


On 2021/2/10 下午4:59, Si-Wei Liu wrote:



On 2/9/2021 7:53 PM, Jason Wang wrote:


On 2021/2/10 上午10:30, Si-Wei Liu wrote:



On 2/8/2021 10:37 PM, Jason Wang wrote:


On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:
When a change of memory map occurs, the hardware 
resources are destroyed
and then re-created again with the new memory map. In 
such case, we need
to restore the hardware available and used indices. The 
driver failed to

restore the used index which is added here.

Also, since the driver also fails to reset the available 
and used
indices upon device reset, fix this here to avoid 
regression caused by

the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
supported mlx5

devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 
++

     1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
     u64 device_addr;
     u64 driver_addr;
     u16 avail_index;
+    u16 used_index;
     bool ready;
     struct vdpa_callback cb;
     bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
     u32 virtq_id;
     struct mlx5_vdpa_net *ndev;
     u16 avail_idx;
+    u16 used_idx;
     int fw_state;
       /* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtque
       obj_context = 
MLX5_ADDR_OF(create_virtio_net_q_in, in,

obj_context);
     MLX5_SET(virtio_net_q_object, obj_context, 
hw_available_index,

mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, 
hw_used_index,

mvq->used_idx);
     MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, 
obj_context,

virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtqueue *m
     struct mlx5_virtq_attr {
     u8 state;
     u16 available_index;
+    u16 used_index;
     };
       static int query_virtqueue(struct mlx5_vdpa_net 
*ndev, struct

mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
     memset(attr, 0, sizeof(*attr));
     attr->state = MLX5_GET(virtio_net_q_object, 
obj_context, state);
     attr->available_index = 
MLX5_GET(virtio_net_q_object,

obj_context, hw_available_index);
+    attr->used_index = MLX5_GET(virtio_net_q_object, 
obj_context,

hw_used_index);
     kfree(out);
     return 0;
     @@ -1535,6 +1540,16 @@ static void 
teardown_virtqueues(struct

mlx5_vdpa_net *ndev)
     }
     }
     +static void clear_virtqueues(struct mlx5_vdpa_net 
*ndev)

+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
     /* TODO: cross-endian support */
     static inline bool 
mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev

*mvdev)
     {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
     return err;
       ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
     ri->ready = mvq->ready;
     ri->num_ent = mvq->num_ent;
     ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void 
restore_channels_info(struct

mlx5_vdpa_net *ndev)
     continue;
       mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
     mvq->ready = ri->ready;
     mvq->num_ent = ri->num_ent;
     mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void 
mlx5_vdpa_set_status(struct

vdpa_device *vdev, u8 status)
     if (!status) {
     mlx5_vdpa_info(mvdev, "performing device 
reset\n");

     teardown_driver(ndev);
+    clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns 
with the other
state cleanups floating around at the same place. 
However, the thing is
get_vq_st

[PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-19 Thread Si-Wei Liu
Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set. Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device 
*vdev)
return mvdev->mlx_features;
 }
 
-static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
-{
-   if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-   return -EOPNOTSUPP;
-
-   return 0;
-}
-
 static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
 {
int err;
@@ -1558,18 +1550,13 @@ static int mlx5_vdpa_set_features(struct vdpa_device 
*vdev, u64 features)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-   int err;
 
print_features(mvdev, features, true);
 
-   err = verify_min_features(mvdev, features);
-   if (err)
-   return err;
-
ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP);
-   return err;
+   return 0;
 }
 
 static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct 
vdpa_callback *cb)
-- 
1.8.3.1



Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-18 Thread Si-Wei Liu




On 2/17/2021 8:44 PM, Jason Wang wrote:


On 2021/2/10 下午4:59, Si-Wei Liu wrote:



On 2/9/2021 7:53 PM, Jason Wang wrote:


On 2021/2/10 上午10:30, Si-Wei Liu wrote:



On 2/8/2021 10:37 PM, Jason Wang wrote:


On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:
When a change of memory map occurs, the hardware resources 
are destroyed
and then re-created again with the new memory map. In such 
case, we need
to restore the hardware available and used indices. The 
driver failed to

restore the used index which is added here.

Also, since the driver also fails to reset the available 
and used
indices upon device reset, fix this here to avoid 
regression caused by

the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
supported mlx5

devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 
++

     1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
     u64 device_addr;
     u64 driver_addr;
     u16 avail_index;
+    u16 used_index;
     bool ready;
     struct vdpa_callback cb;
     bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
     u32 virtq_id;
     struct mlx5_vdpa_net *ndev;
     u16 avail_idx;
+    u16 used_idx;
     int fw_state;
       /* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtque
       obj_context = 
MLX5_ADDR_OF(create_virtio_net_q_in, in,

obj_context);
     MLX5_SET(virtio_net_q_object, obj_context, 
hw_available_index,

mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, 
hw_used_index,

mvq->used_idx);
     MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, 
obj_context,

virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtqueue *m
     struct mlx5_virtq_attr {
     u8 state;
     u16 available_index;
+    u16 used_index;
     };
       static int query_virtqueue(struct mlx5_vdpa_net 
*ndev, struct

mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
     memset(attr, 0, sizeof(*attr));
     attr->state = MLX5_GET(virtio_net_q_object, 
obj_context, state);
     attr->available_index = 
MLX5_GET(virtio_net_q_object,

obj_context, hw_available_index);
+    attr->used_index = MLX5_GET(virtio_net_q_object, 
obj_context,

hw_used_index);
     kfree(out);
     return 0;
     @@ -1535,6 +1540,16 @@ static void 
teardown_virtqueues(struct

mlx5_vdpa_net *ndev)
     }
     }
     +static void clear_virtqueues(struct mlx5_vdpa_net 
*ndev)

+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
     /* TODO: cross-endian support */
     static inline bool mlx5_vdpa_is_little_endian(struct 
mlx5_vdpa_dev

*mvdev)
     {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
     return err;
       ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
     ri->ready = mvq->ready;
     ri->num_ent = mvq->num_ent;
     ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void 
restore_channels_info(struct

mlx5_vdpa_net *ndev)
     continue;
       mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
     mvq->ready = ri->ready;
     mvq->num_ent = ri->num_ent;
     mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct
vdpa_device *vdev, u8 status)
     if (!status) {
     mlx5_vdpa_info(mvdev, "performing device 
reset\n");

     teardown_driver(ndev);
+    clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns 
with the other
state cleanups floating around at the same place. However, 
the thing is
get_vq_state() is supposed to be called right after to get 
sync'ed with
the latest interna

Re: [PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK

2021-02-17 Thread Si-Wei Liu




On 2/16/2021 7:21 AM, Eli Cohen wrote:

On Thu, Feb 11, 2021 at 09:33:14AM +0200, Eli Cohen wrote:

On Wed, Feb 10, 2021 at 01:48:00PM -0800, Si-Wei Liu wrote:

While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
map")
Signed-off-by: Si-Wei Liu 
Acked-by: Jason Wang 

Acked-by: Eli Cohen 


I take it back. I think we don't need to clear the indexes at all. In
case we need to restore indexes we'll get the right values through
set_vq_state(). If we suspend the virtqueue due to VM being suspended,
qemu will query first and will provide the the queried value. In case of
VM reboot, it will provide 0 in set_vq_state().

I am sending a patch that addresses both reboot and suspend.
With set_vq_state() repurposed to restoring used_index I'm fine with 
this approach.


Do I have to repost a v3 of this series while dropping the 3rd patch?

-Siwei



---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..ce6aae8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
@@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
  
  	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {

if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+   clear_virtqueues(ndev);
err = setup_driver(ndev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
--
1.8.3.1





Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-17 Thread Si-Wei Liu




On 2/17/2021 1:20 PM, Michael S. Tsirkin wrote:

On Wed, Feb 17, 2021 at 11:42:48AM -0800, Si-Wei Liu wrote:


On 2/16/2021 8:20 AM, Eli Cohen wrote:

When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.

Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 

Acked-by: Si-Wei Liu 


Seems to also fix b35ccebe3ef76168aa2edaa35809c0232cb3578e, right?

I think so. It should have both "Fixes" tags.

-Siwei



---
   drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
   1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8e9d525d66c..a51b0f86afe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
return;
}
mvq->avail_idx = attr.available_index;
+   mvq->used_idx = attr.used_index;
   }
   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
@@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
*vdev, u16 idx,
return -EINVAL;
}
+   mvq->used_idx = state->avail_index;
mvq->avail_idx = state->avail_index;
return 0;
   }
@@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
 * that cares about emulating the index after vq is stopped.
 */
if (!mvq->initialized) {
-   state->avail_index = mvq->avail_idx;
+   state->avail_index = mvq->used_idx;
return 0;
}
@@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
return err;
}
-   state->avail_index = attr.available_index;
+   state->avail_index = attr.used_index;
return 0;
   }
@@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net 
*ndev)
}
   }
-static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
-{
-   int i;
-
-   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-   ndev->vqs[i].avail_idx = 0;
-   ndev->vqs[i].used_idx = 0;
-   }
-}
-
   /* TODO: cross-endian support */
   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
   {
@@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;




Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration

2021-02-17 Thread Si-Wei Liu




On 2/16/2021 8:20 AM, Eli Cohen wrote:

When we suspend the VM, the VDPA interface will be reset. When the VM is
resumed again, clear_virtqueues() will clear the available and used
indices resulting in hardware virqtqueue objects becoming out of sync.
We can avoid this function alltogether since qemu will clear them if
required, e.g. when the VM went through a reboot.

Moreover, since the hw available and used indices should always be
identical on query and should be restored to the same value same value
for virtqueues that complete in order, we set the single value provided
by set_vq_state(). In get_vq_state() we return the value of hardware
used index.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 

Acked-by: Si-Wei Liu 


---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 -
  1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8e9d525d66c..a51b0f86afe2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
return;
}
mvq->avail_idx = attr.available_index;
+   mvq->used_idx = attr.used_index;
  }
  
  static void suspend_vqs(struct mlx5_vdpa_net *ndev)

@@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device 
*vdev, u16 idx,
return -EINVAL;
}
  
+	mvq->used_idx = state->avail_index;

mvq->avail_idx = state->avail_index;
return 0;
  }
@@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device 
*vdev, u16 idx, struct vdpa
 * that cares about emulating the index after vq is stopped.
 */
if (!mvq->initialized) {
-   state->avail_index = mvq->avail_idx;
+   state->avail_index = mvq->used_idx;
return 0;
}
  
@@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa

mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
return err;
}
-   state->avail_index = attr.available_index;
+   state->avail_index = attr.used_index;
return 0;
  }
  
@@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)

}
  }
  
-static void clear_virtqueues(struct mlx5_vdpa_net *ndev)

-{
-   int i;
-
-   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-   ndev->vqs[i].avail_idx = 0;
-   ndev->vqs[i].used_idx = 0;
-   }
-}
-
  /* TODO: cross-endian support */
  static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
  {
@@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;




Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-16 Thread Si-Wei Liu




On 2/10/2021 7:45 AM, Eli Cohen wrote:

On Wed, Feb 10, 2021 at 12:59:03AM -0800, Si-Wei Liu wrote:


On 2/9/2021 7:53 PM, Jason Wang wrote:

On 2021/2/10 上午10:30, Si-Wei Liu wrote:


On 2/8/2021 10:37 PM, Jason Wang wrote:

On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:

When a change of memory map
occurs, the hardware resources
are destroyed
and then re-created again with
the new memory map. In such
case, we need
to restore the hardware
available and used indices. The
driver failed to
restore the used index which is added here.

Also, since the driver also
fails to reset the available and
used
indices upon device reset, fix
this here to avoid regression
caused by
the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5:
Add VDPA driver for supported
mlx5
devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

      drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
      1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
      u64 device_addr;
      u64 driver_addr;
      u16 avail_index;
+    u16 used_index;
      bool ready;
      struct vdpa_callback cb;
      bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
      u32 virtq_id;
      struct mlx5_vdpa_net *ndev;
      u16 avail_idx;
+    u16 used_idx;
      int fw_state;
        /* keep last in the struct */
@@ -804,6 +806,7 @@ static int
create_virtqueue(struct
mlx5_vdpa_net
*ndev, struct mlx5_vdpa_virtque
        obj_context =
MLX5_ADDR_OF(create_virtio_net_q_in,
in,
obj_context);
 
MLX5_SET(virtio_net_q_object,

obj_context, hw_available_index,
mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
mvq->used_idx);
      MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
      vq_ctx =
MLX5_ADDR_OF(virtio_net_q_object,
obj_context,
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int
connect_qps(struct mlx5_vdpa_net
*ndev, struct mlx5_vdpa_virtqueue *m
      struct mlx5_virtq_attr {
      u8 state;
      u16 available_index;
+    u16 used_index;
      };
        static int
query_virtqueue(struct
mlx5_vdpa_net *ndev, struct
mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
      memset(attr, 0, sizeof(*attr));
      attr->state =
MLX5_GET(virtio_net_q_object,
obj_context, state);
      attr->available_index = MLX5_GET(virtio_net_q_object,
obj_context, hw_available_index);
+    attr->used_index =
MLX5_GET(virtio_net_q_object,
obj_context,
hw_used_index);
      kfree(out);
      return 0;
      @@ -1535,6 +1540,16 @@
static void
teardown_virtqueues(struct
mlx5_vdpa_net *ndev)
      }
      }
      +static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
      /* TODO: cross-endian support */
      static inline bool
mlx5_vdpa_is_little_endian(struct
mlx5_vdpa_dev
*mvdev)
      {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
      return err;
        ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
      ri->ready = mvq->ready;
      ri->num_ent = mvq->num_ent;
      ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void restore_channels_info(struct
mlx5_vdpa_net *ndev)
      continue;
        mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
      mvq->ready = ri->ready;
      mvq->num_ent = ri->num_ent;
      mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct
vdpa_device *vdev, u8 status)
      if (!status) {
 
mlx5_vdpa_info(mvdev,

"performing device reset\n");
      teardown_driver(ndev);
+    clear_virtqueues(ndev);

The clearing looks fine at the first
glance, as it aligns with the other
state cleanups floating around at
the same place. However, the thing
is
get_vq_state() is supposed to be
called right after to get sync'ed
with

[GIT PULL] Hyper-V commits for 5.12

2021-02-16 Thread Wei Liu
Hi Linus,

The following changes since commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04:

  Linux 5.11-rc5 (2021-01-24 16:47:14 -0800)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git 
tags/hyperv-next-signed-20210216

for you to fetch changes up to 3019270282a175defc02c8331786c73e082cd2a8:

  Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring 
buffer" (2021-02-15 10:49:11 +)


hyperv-next for 5.12
  - VMBus hardening patches from Andrea Parri and Andres Beltran.
  - Patches to make Linux boot as the root partition on Microsoft Hypervisor
    from Wei Liu.
  - One patch to add a new sysfs interface to support hibernation on Hyper-V
from Dexuan Cui.
  - Two miscellaneous clean-up patches from Colin and Gustavo.


Andrea Parri (Microsoft) (9):
  Drivers: hv: vmbus: Initialize memory to be sent to the host
  Drivers: hv: vmbus: Reduce number of references to message in 
vmbus_on_msg_dpc()
  Drivers: hv: vmbus: Copy the hv_message in vmbus_on_msg_dpc()
  Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind()
  Drivers: hv: vmbus: Resolve race condition in vmbus_onoffer_rescind()
  x86/hyperv: Load/save the Isolation Configuration leaf
  Drivers: hv: vmbus: Restrict vmbus_devices on isolated guests
  Drivers: hv: vmbus: Enforce 'VMBus version >= 5.2' on isolated guests
  hv_netvsc: Restrict configurations on isolated guests

Andres Beltran (2):
  Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer
  hv_utils: Add validation for untrusted Hyper-V values

Colin Ian King (1):
  hv_utils: Fix spelling mistake "Hearbeat" -> "Heartbeat"

Dexuan Cui (1):
  Drivers: hv: vmbus: Add /sys/bus/vmbus/hibernation

Gustavo A. R. Silva (1):
  hv: hyperv.h: Replace one-element array with flexible-array in struct 
icmsg_negotiate

Wei Liu (17):
  asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
  x86/hyperv: detect if Linux is the root partition
  Drivers: hv: vmbus: skip VMBus initialization if Linux is root
  clocksource/hyperv: use MSR-based access if running as root
  x86/hyperv: allocate output arg pages if required
  x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
  x86/hyperv: handling hypercall page setup for root
  ACPI / NUMA: add a stub function for node_to_pxm()
  x86/hyperv: provide a bunch of helper functions
  x86/hyperv: implement and use hv_smp_prepare_cpus
  asm-generic/hyperv: update hv_msi_entry
  asm-generic/hyperv: update hv_interrupt_entry
  asm-generic/hyperv: introduce hv_device_id and auxiliary structures
  asm-generic/hyperv: import data structures for mapping device interrupts
  x86/hyperv: implement an MSI domain for root partition
  iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
  Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring 
buffer"

 Documentation/ABI/stable/sysfs-bus-vmbus |   7 +
 arch/x86/hyperv/Makefile |   4 +-
 arch/x86/hyperv/hv_init.c| 122 +-
 arch/x86/hyperv/hv_proc.c| 219 ++
 arch/x86/hyperv/irqdomain.c  | 385 +++
 arch/x86/include/asm/hyperv-tlfs.h   |  38 +++
 arch/x86/include/asm/mshyperv.h  |  19 +-
 arch/x86/kernel/cpu/mshyperv.c   |  58 +
 drivers/clocksource/hyperv_timer.c   |   3 +
 drivers/hv/channel.c |   4 +-
 drivers/hv/channel_mgmt.c|  77 ++-
 drivers/hv/connection.c  |   7 +
 drivers/hv/hv_fcopy.c|  36 ++-
 drivers/hv/hv_kvp.c  | 122 +-
 drivers/hv/hv_snapshot.c |  89 ---
 drivers/hv/hv_util.c | 222 +++---
 drivers/hv/vmbus_drv.c   |  64 +++--
 drivers/iommu/hyperv-iommu.c | 177 +-
 drivers/net/hyperv/netvsc.c  |  18 +-
 drivers/pci/controller/pci-hyperv.c  |   2 +-
 include/acpi/acpi_numa.h |   4 +
 include/asm-generic/hyperv-tlfs.h| 255 +++-
 include/asm-generic/mshyperv.h   |   5 +
 include/linux/hyperv.h   |  13 +-
 24 files changed, 1717 insertions(+), 233 deletions(-)
 create mode 100644 arch/x86/hyperv/hv_proc.c
 create mode 100644 arch/x86/hyperv/irqdomain.c


Re: Regressions with VMBus/VSCs hardening changes

2021-02-15 Thread Wei Liu
On Fri, Feb 12, 2021 at 05:50:50PM +0100, Andrea Parri wrote:
> Hi all,
[...]
> 2) IIUC a8c3209998afb5 could be dropped (after rebase) without further modi-
>fications to hyperv-next.

I've reverted the said patch from hyperv-next.

Wei.


Re: [PATCH v2 4/8] xen/netback: fix spurious event detection for common event case

2021-02-11 Thread Wei Liu
On Thu, Feb 11, 2021 at 11:16:12AM +0100, Juergen Gross wrote:
> In case of a common event for rx and tx queue the event should be
> regarded to be spurious if no rx and no tx requests are pending.
> 
> Unfortunately the condition for testing that is wrong causing to
> decide a event being spurious if no rx OR no tx requests are
> pending.
> 
> Fix that plus using local variables for rx/tx pending indicators in
> order to split function calls and if condition.
> 
> Fixes: 23025393dbeb3b ("xen/netback: use lateeoi irq binding")
> Signed-off-by: Juergen Gross 

Reviewed-by: Wei Liu 


[PATCH v2 1/3] vdpa/mlx5: should exclude header length and fcs from mtu

2021-02-10 Thread Si-Wei Liu
When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa,
22 extra bytes worth of MTU length is shown in guest.
This is because the mlx5_query_port_max_mtu API returns
the "hardware" MTU value, which does not just contain the
 Ethernet payload, but includes extra lengths starting
from the Ethernet header up to the FCS altogether.

Fix the MTU so packets won't get dropped silently.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
Acked-by: Jason Wang 
Acked-by: Eli Cohen 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 15 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 08f742f..b6cc53b 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -4,9 +4,13 @@
 #ifndef __MLX5_VDPA_H__
 #define __MLX5_VDPA_H__
 
+#include 
+#include 
 #include 
 #include 
 
+#define MLX5V_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
+
 struct mlx5_vdpa_direct_mr {
u64 start;
u64 end;
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dc88559..b8416c4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1907,6 +1907,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 
idx)
.free = mlx5_vdpa_free,
 };
 
+static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
+{
+   u16 hw_mtu;
+   int err;
+
+   err = mlx5_query_nic_vport_mtu(mdev, _mtu);
+   if (err)
+   return err;
+
+   *mtu = hw_mtu - MLX5V_ETH_HARD_MTU;
+   return 0;
+}
+
 static int alloc_resources(struct mlx5_vdpa_net *ndev)
 {
struct mlx5_vdpa_net_resources *res = >res;
@@ -1992,7 +2005,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
init_mvqs(ndev);
mutex_init(>reslock);
config = >config;
-   err = mlx5_query_nic_vport_mtu(mdev, >mtu);
+   err = query_mtu(mdev, >mtu);
if (err)
goto err_mtu;
 
-- 
1.8.3.1



[PATCH v2 0/3] mlx5_vdpa bug fixes

2021-02-10 Thread Si-Wei Liu
This set attempts to fix a few independent issues recently found
in mlx5_vdpa driver. Please find details for each in the commit
message.

Patch 1 and patch 3 are already Ack'ed by Jason Wang. Patch 2 is
reworked to move virtio feature capability query to mlx5v_probe()
as suggested by Eli.

--
v1->v2: move feature capability query to probing (Eli)

Si-Wei Liu (3):
  vdpa/mlx5: should exclude header length and fcs from mtu
  vdpa/mlx5: fix feature negotiation across device reset
  vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK

 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 42 +++---
 2 files changed, 34 insertions(+), 12 deletions(-)

-- 
1.8.3.1



[PATCH v2 3/3] vdpa/mlx5: defer clear_virtqueues to until DRIVER_OK

2021-02-10 Thread Si-Wei Liu
While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
map")
Signed-off-by: Si-Wei Liu 
Acked-by: Jason Wang 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..ce6aae8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1777,7 +1777,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
@@ -1786,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
 
if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+   clear_virtqueues(ndev);
err = setup_driver(ndev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
-- 
1.8.3.1



[PATCH v2 2/3] vdpa/mlx5: fix feature negotiation across device reset

2021-02-10 Thread Si-Wei Liu
The mlx_features denotes the capability for which
set of virtio features is supported by device. In
principle, this field needs not be cleared during
virtio device reset, as this capability is static
and does not change across reset.

In fact, the current code may have the assumption
that mlx_features can be reloaded from firmware
via the .get_features ops after device is reset
(via the .set_status ops), which is unfortunately
not true. The userspace VMM might save a copy
of backend capable features and won't call into
kernel again to get it on reset. This causes all
virtio features getting disabled on newly created
virtqs after device reset, while guest would hold
mismatched view of available features. For e.g.,
the guest may still assume tx checksum offload
is available after reset and feature negotiation,
causing frames with bogus (incomplete) checksum
transmitted on the wire.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8416c4..7c1f789 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1486,16 +1486,8 @@ static u64 mlx_to_vritio_features(u16 dev_features)
 static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
-   struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-   u16 dev_features;
 
-   dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
device_features_bits_mask);
-   ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
-   if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
-   ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1);
-   ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM);
-   print_features(mvdev, ndev->mvdev.mlx_features, false);
-   return ndev->mvdev.mlx_features;
+   return mvdev->mlx_features;
 }
 
 static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
@@ -1788,7 +1780,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
-   ndev->mvdev.mlx_features = 0;
++mvdev->generation;
return;
}
@@ -1907,6 +1898,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 
idx)
.free = mlx5_vdpa_free,
 };
 
+static void query_virtio_features(struct mlx5_vdpa_net *ndev)
+{
+   struct mlx5_vdpa_dev *mvdev = >mvdev;
+   u16 dev_features;
+
+   dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, 
device_features_bits_mask);
+   mvdev->mlx_features = mlx_to_vritio_features(dev_features);
+   if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
+   mvdev->mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1);
+   mvdev->mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM);
+   print_features(mvdev, mvdev->mlx_features, false);
+}
+
 static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
 {
u16 hw_mtu;
@@ -2005,6 +2009,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
init_mvqs(ndev);
mutex_init(>reslock);
config = >config;
+   query_virtio_features(ndev);
err = query_mtu(mdev, >mtu);
if (err)
goto err_mtu;
-- 
1.8.3.1



Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-10 Thread Si-Wei Liu




On 2/9/2021 7:53 PM, Jason Wang wrote:


On 2021/2/10 上午10:30, Si-Wei Liu wrote:



On 2/8/2021 10:37 PM, Jason Wang wrote:


On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:
When a change of memory map occurs, the hardware resources 
are destroyed
and then re-created again with the new memory map. In such 
case, we need
to restore the hardware available and used indices. The 
driver failed to

restore the used index which is added here.

Also, since the driver also fails to reset the available and 
used
indices upon device reset, fix this here to avoid regression 
caused by

the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
supported mlx5

devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
     1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
     u64 device_addr;
     u64 driver_addr;
     u16 avail_index;
+    u16 used_index;
     bool ready;
     struct vdpa_callback cb;
     bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
     u32 virtq_id;
     struct mlx5_vdpa_net *ndev;
     u16 avail_idx;
+    u16 used_idx;
     int fw_state;
       /* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtque
       obj_context = 
MLX5_ADDR_OF(create_virtio_net_q_in, in,

obj_context);
     MLX5_SET(virtio_net_q_object, obj_context, 
hw_available_index,

mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
mvq->used_idx);
     MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, 
obj_context,

virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtqueue *m
     struct mlx5_virtq_attr {
     u8 state;
     u16 available_index;
+    u16 used_index;
     };
       static int query_virtqueue(struct mlx5_vdpa_net 
*ndev, struct

mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
     memset(attr, 0, sizeof(*attr));
     attr->state = MLX5_GET(virtio_net_q_object, 
obj_context, state);

     attr->available_index = MLX5_GET(virtio_net_q_object,
obj_context, hw_available_index);
+    attr->used_index = MLX5_GET(virtio_net_q_object, 
obj_context,

hw_used_index);
     kfree(out);
     return 0;
     @@ -1535,6 +1540,16 @@ static void 
teardown_virtqueues(struct

mlx5_vdpa_net *ndev)
     }
     }
     +static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
     /* TODO: cross-endian support */
     static inline bool mlx5_vdpa_is_little_endian(struct 
mlx5_vdpa_dev

*mvdev)
     {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
     return err;
       ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
     ri->ready = mvq->ready;
     ri->num_ent = mvq->num_ent;
     ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void restore_channels_info(struct
mlx5_vdpa_net *ndev)
     continue;
       mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
     mvq->ready = ri->ready;
     mvq->num_ent = ri->num_ent;
     mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct
vdpa_device *vdev, u8 status)
     if (!status) {
     mlx5_vdpa_info(mvdev, "performing device 
reset\n");

     teardown_driver(ndev);
+    clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns 
with the other
state cleanups floating around at the same place. However, 
the thing is
get_vq_state() is supposed to be called right after to get 
sync'ed with
the latest internal avail_index from device while vq is 
stopped. The
index was saved in the driver software at vq 

Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-09 Thread Si-Wei Liu




On 2/8/2021 10:37 PM, Jason Wang wrote:


On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:
When a change of memory map occurs, the hardware resources are 
destroyed
and then re-created again with the new memory map. In such 
case, we need
to restore the hardware available and used indices. The driver 
failed to

restore the used index which is added here.

Also, since the driver also fails to reset the available and used
indices upon device reset, fix this here to avoid regression 
caused by

the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported 
mlx5

devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
     1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
     u64 device_addr;
     u64 driver_addr;
     u16 avail_index;
+    u16 used_index;
     bool ready;
     struct vdpa_callback cb;
     bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
     u32 virtq_id;
     struct mlx5_vdpa_net *ndev;
     u16 avail_idx;
+    u16 used_idx;
     int fw_state;
       /* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtque
       obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in,
obj_context);
     MLX5_SET(virtio_net_q_object, obj_context, 
hw_available_index,

mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
mvq->used_idx);
     MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context,
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net
*ndev, struct mlx5_vdpa_virtqueue *m
     struct mlx5_virtq_attr {
     u8 state;
     u16 available_index;
+    u16 used_index;
     };
       static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct

mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
     memset(attr, 0, sizeof(*attr));
     attr->state = MLX5_GET(virtio_net_q_object, 
obj_context, state);

     attr->available_index = MLX5_GET(virtio_net_q_object,
obj_context, hw_available_index);
+    attr->used_index = MLX5_GET(virtio_net_q_object, 
obj_context,

hw_used_index);
     kfree(out);
     return 0;
     @@ -1535,6 +1540,16 @@ static void 
teardown_virtqueues(struct

mlx5_vdpa_net *ndev)
     }
     }
     +static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
     /* TODO: cross-endian support */
     static inline bool mlx5_vdpa_is_little_endian(struct 
mlx5_vdpa_dev

*mvdev)
     {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
     return err;
       ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
     ri->ready = mvq->ready;
     ri->num_ent = mvq->num_ent;
     ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void restore_channels_info(struct
mlx5_vdpa_net *ndev)
     continue;
       mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
     mvq->ready = ri->ready;
     mvq->num_ent = ri->num_ent;
     mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct
vdpa_device *vdev, u8 status)
     if (!status) {
     mlx5_vdpa_info(mvdev, "performing device reset\n");
     teardown_driver(ndev);
+    clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns with 
the other
state cleanups floating around at the same place. However, the 
thing is
get_vq_state() is supposed to be called right after to get 
sync'ed with
the latest internal avail_index from device while vq is 
stopped. The
index was saved in the driver software at vq suspension, but 
before the
virtq object is destroyed. We shouldn't clear the avail_index 
t

Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK

2021-02-09 Thread Si-Wei Liu




On 2/8/2021 7:37 PM, Jason Wang wrote:


On 2021/2/6 下午8:29, Si-Wei Liu wrote:

While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index 
after change map")

Signed-off-by: Si-Wei Liu 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c

index aa6f8cd..444ab58 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct 
vdpa_device *vdev, u8 status)

  if (!status) {
  mlx5_vdpa_info(mvdev, "performing device reset\n");
  teardown_driver(ndev);
-    clear_virtqueues(ndev);
  mlx5_vdpa_destroy_mr(>mvdev);
  ndev->mvdev.status = 0;
  ++mvdev->generation;
@@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct 
vdpa_device *vdev, u8 status)

    if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
  if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+    clear_virtqueues(ndev);



Rethink about this. As mentioned in another thread, this in fact 
breaks set_vq_state().  (See vhost_virtqueue_start() -> 
vhost_vdpa_set_vring_base() in qemu codes).

I assume that the clearing for vhost-vdpa would be done via (qemu code),

vhost_dev_start()->vhost_vdpa_dev_start()->vhost_vdpa_call(status | 
VIRTIO_CONFIG_S_DRIVER_OK)


which is _after_ vhost_virtqueue_start() gets called to restore the 
avail_idx to h/w in vhost_dev_start(). What am I missing here?


-Siwei




The issue is that the avail idx is forgot, we need keep it.

Thanks



  err = setup_driver(ndev);
  if (err) {
  mlx5_vdpa_warn(mvdev, "failed to setup driver\n");






Re: [PATCH 4/7] xen/events: link interdomain events to associated xenbus device

2021-02-09 Thread Wei Liu
On Sat, Feb 06, 2021 at 11:49:29AM +0100, Juergen Gross wrote:
> In order to support the possibility of per-device event channel
> settings (e.g. lateeoi spurious event thresholds) add a xenbus device
> pointer to struct irq_info() and modify the related event channel
> binding interfaces to take the pointer to the xenbus device as a
> parameter instead of the domain id of the other side.
> 
> While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Wei Liu 


Re: [PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK

2021-02-08 Thread Si-Wei Liu




On 2/7/2021 9:48 PM, Eli Cohen wrote:

On Sat, Feb 06, 2021 at 04:29:24AM -0800, Si-Wei Liu wrote:

While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().


Not sure I understand the scenario. You are talking about reset of the
device followed by up/down events on the interface. How can you trigger
this?
Currently it's not possible to trigger link up/down events with upstream 
QEMU due lack of config/control interrupt implementation. And live 
migration could be another scenario that cannot be satisfied because of 
inconsistent queue state. They share the same root of cause as captured 
here.


-Siwei




Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
map")
Signed-off-by: Si-Wei Liu 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index aa6f8cd..444ab58 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
@@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
  
  	if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {

if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+   clear_virtqueues(ndev);
err = setup_driver(ndev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
--
1.8.3.1





Re: [PATCH 2/3] mlx5_vdpa: fix feature negotiation across device reset

2021-02-08 Thread Si-Wei Liu




On 2/7/2021 9:35 PM, Eli Cohen wrote:

On Sat, Feb 06, 2021 at 04:29:23AM -0800, Si-Wei Liu wrote:

The mlx_features denotes the capability for which
set of virtio features is supported by device. In
principle, this field needs not be cleared during
virtio device reset, as this capability is static
and does not change across reset.

In fact, the current code may have the assumption
that mlx_features can be reloaded from firmware
via the .get_features ops after device is reset
(via the .set_status ops), which is unfortunately
not true. The userspace VMM might save a copy
of backend capable features and won't call into
kernel again to get it on reset. This causes all
virtio features getting disabled on newly created
virtqs after device reset, while guest would hold
mismatched view of available features. For e.g.,
the guest may still assume tx checksum offload
is available after reset and feature negotiation,
causing frames with bogus (incomplete) checksum
transmitted on the wire.

Signed-off-by: Si-Wei Liu 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8416c4..aa6f8cd 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1788,7 +1788,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
-   ndev->mvdev.mlx_features = 0;
++mvdev->generation;
return;
}

Since we assume that device capabilities don't change, I think I would
get the features through a call done in mlx5v_probe after the netdev
object is created and change mlx5_vdpa_get_features() to just return
ndev->mvdev.mlx_features.
Yep, it makes sense. Will post a revised patch. If vdpa tool allows 
reconfiguration post probing, the code has to be reconciled then.




Did you actually see this issue in action? If you did, can you share
with us how you trigerred this?
Issue is indeed seen in action. The mismatched tx-checksum offload as 
described in the commit message was one of such examples. You would need 
a guest reboot though (triggering device reset via the .set_status ops 
and zero'ed mlx_features was loaded to deduce new actual_features for 
creating the h/w virtq object) for repro.


-Siwei



--
1.8.3.1





[PATCH 1/3] mlx5_vdpa: should exclude header length and fcs from mtu

2021-02-06 Thread Si-Wei Liu
When feature VIRTIO_NET_F_MTU is negotiated on mlx5_vdpa,
22 extra bytes worth of MTU length is shown in guest.
This is because the mlx5_query_port_max_mtu API returns
the "hardware" MTU value, which does not just contain the
Ethernet payload, but includes extra lengths starting
from the Ethernet header up to the FCS altogether.

Fix the MTU so packets won't get dropped silently.

Signed-off-by: Si-Wei Liu 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 15 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 08f742f..b6cc53b 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -4,9 +4,13 @@
 #ifndef __MLX5_VDPA_H__
 #define __MLX5_VDPA_H__
 
+#include 
+#include 
 #include 
 #include 
 
+#define MLX5V_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
+
 struct mlx5_vdpa_direct_mr {
u64 start;
u64 end;
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dc88559..b8416c4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1907,6 +1907,19 @@ static int mlx5_get_vq_irq(struct vdpa_device *vdv, u16 
idx)
.free = mlx5_vdpa_free,
 };
 
+static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
+{
+   u16 hw_mtu;
+   int err;
+
+   err = mlx5_query_nic_vport_mtu(mdev, _mtu);
+   if (err)
+   return err;
+
+   *mtu = hw_mtu - MLX5V_ETH_HARD_MTU;
+   return 0;
+}
+
 static int alloc_resources(struct mlx5_vdpa_net *ndev)
 {
struct mlx5_vdpa_net_resources *res = >res;
@@ -1992,7 +2005,7 @@ static int mlx5v_probe(struct auxiliary_device *adev,
init_mvqs(ndev);
mutex_init(>reslock);
config = >config;
-   err = mlx5_query_nic_vport_mtu(mdev, >mtu);
+   err = query_mtu(mdev, >mtu);
if (err)
goto err_mtu;
 
-- 
1.8.3.1



[PATCH 3/3] mlx5_vdpa: defer clear_virtqueues to until DRIVER_OK

2021-02-06 Thread Si-Wei Liu
While virtq is stopped,  get_vq_state() is supposed to
be  called to  get  sync'ed  with  the latest internal
avail_index from device. The saved avail_index is used
to restate  the virtq  once device is started.  Commit
b35ccebe3ef7 introduced the clear_virtqueues() routine
to  reset  the saved  avail_index,  however, the index
gets cleared a bit earlier before get_vq_state() tries
to read it. This would cause consistency problems when
virtq is restarted, e.g. through a series of link down
and link up events. We  could  defer  the  clearing of
avail_index  to  until  the  device  is to be started,
i.e. until  VIRTIO_CONFIG_S_DRIVER_OK  is set again in
set_status().

Fixes: b35ccebe3ef7 ("vdpa/mlx5: Restore the hardware used index after change 
map")
Signed-off-by: Si-Wei Liu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index aa6f8cd..444ab58 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1785,7 +1785,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
-   clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
++mvdev->generation;
@@ -1794,6 +1793,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
 
if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+   clear_virtqueues(ndev);
err = setup_driver(ndev);
if (err) {
mlx5_vdpa_warn(mvdev, "failed to setup 
driver\n");
-- 
1.8.3.1



[PATCH 2/3] mlx5_vdpa: fix feature negotiation across device reset

2021-02-06 Thread Si-Wei Liu
The mlx_features denotes the capability for which
set of virtio features is supported by device. In
principle, this field needs not be cleared during
virtio device reset, as this capability is static
and does not change across reset.

In fact, the current code may have the assumption
that mlx_features can be reloaded from firmware
via the .get_features ops after device is reset
(via the .set_status ops), which is unfortunately
not true. The userspace VMM might save a copy
of backend capable features and won't call into
kernel again to get it on reset. This causes all
virtio features getting disabled on newly created
virtqs after device reset, while guest would hold
mismatched view of available features. For e.g.,
the guest may still assume tx checksum offload
is available after reset and feature negotiation,
causing frames with bogus (incomplete) checksum
transmitted on the wire.

Signed-off-by: Si-Wei Liu 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b8416c4..aa6f8cd 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1788,7 +1788,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
clear_virtqueues(ndev);
mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
-   ndev->mvdev.mlx_features = 0;
++mvdev->generation;
return;
}
-- 
1.8.3.1



Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-05 Thread Si-Wei Liu




On 2/3/2021 11:36 PM, Eli Cohen wrote:

When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.

Also, since the driver also fails to reset the available and used
indices upon device reset, fix this here to avoid regression caused by
the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen 
---
v0 -> v1:
Clear indices upon device reset

  drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
u64 device_addr;
u64 driver_addr;
u16 avail_index;
+   u16 used_index;
bool ready;
struct vdpa_callback cb;
bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
u32 virtq_id;
struct mlx5_vdpa_net *ndev;
u16 avail_idx;
+   u16 used_idx;
int fw_state;
  
  	/* keep last in the struct */

@@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
  
  	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);

MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
mvq->avail_idx);
+   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
mvq->used_idx);
MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 get_features_12_3(ndev->mvdev.actual_features));
vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *m
  struct mlx5_virtq_attr {
u8 state;
u16 available_index;
+   u16 used_index;
  };
  
  static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,

@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqueu
memset(attr, 0, sizeof(*attr));
attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_available_index);
+   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
hw_used_index);
kfree(out);
return 0;
  
@@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)

}
  }
  
+static void clear_virtqueues(struct mlx5_vdpa_net *ndev)

+{
+   int i;
+
+   for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+   ndev->vqs[i].avail_idx = 0;
+   ndev->vqs[i].used_idx = 0;
+   }
+}
+
  /* TODO: cross-endian support */
  static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
  {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtqu
return err;
  
  	ri->avail_index = attr.available_index;

+   ri->used_index = attr.used_index;
ri->ready = mvq->ready;
ri->num_ent = mvq->num_ent;
ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
continue;
  
  		mvq->avail_idx = ri->avail_index;

+   mvq->used_idx = ri->used_index;
mvq->ready = ri->ready;
mvq->num_ent = ri->num_ent;
mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
if (!status) {
mlx5_vdpa_info(mvdev, "performing device reset\n");
teardown_driver(ndev);
+   clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns with the other 
state cleanups floating around at the same place. However, the thing is 
get_vq_state() is supposed to be called right after to get sync'ed with 
the latest internal avail_index from device while vq is stopped. The 
index was saved in the driver software at vq suspension, but before the 
virtq object is destroyed. We shouldn't clear the avail_index too early.


Possibly it can be postponed to where VIRTIO_CONFIG_S_DRIVER_OK gets set 
again, i.e. right before the setup_driver() in mlx5_vdpa_set_status()?


-Siwei


mlx5_vdpa_destroy_mr(>mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.mlx_features = 0;




Re: linux-next: manual merge of the hyperv tree with Linus' tree

2021-02-05 Thread Wei Liu
On Fri, Feb 05, 2021 at 07:02:02PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the hyperv tree got a conflict in:
> 
>   arch/x86/hyperv/hv_init.c
> 
> between commit:
> 
>   fff7b5e6ee63 ("x86/hyperv: Initialize clockevents after LAPIC is 
> initialized")
> 
> from Linus' tree and commits:
> 
>   a06c2e7df586 ("x86/hyperv: extract partition ID from Microsoft Hypervisor 
> if necessary")
>   fa2c411b58fe ("x86/hyperv: implement an MSI domain for root partition")
> 
> from the hyperv tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 

Thanks Stephen.

I've fixed up the conflicts locally in hyperv-next.

Wei.


Re: [PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor

2021-02-04 Thread Wei Liu
On Wed, Feb 03, 2021 at 03:04:19PM +, Wei Liu wrote:
> Wei Liu (16):
>   asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to
> HV_CPU_MANAGEMENT
>   x86/hyperv: detect if Linux is the root partition
>   Drivers: hv: vmbus: skip VMBus initialization if Linux is root
>   clocksource/hyperv: use MSR-based access if running as root
>   x86/hyperv: allocate output arg pages if required
>   x86/hyperv: extract partition ID from Microsoft Hypervisor if
> necessary
>   x86/hyperv: handling hypercall page setup for root
>   ACPI / NUMA: add a stub function for node_to_pxm()
>   x86/hyperv: provide a bunch of helper functions
>   x86/hyperv: implement and use hv_smp_prepare_cpus
>   asm-generic/hyperv: update hv_msi_entry
>   asm-generic/hyperv: update hv_interrupt_entry
>   asm-generic/hyperv: introduce hv_device_id and auxiliary structures
>   asm-generic/hyperv: import data structures for mapping device
> interrupts
>   x86/hyperv: implement an MSI domain for root partition
>   iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

This series is now rebased and pushed to hyperv-next.

Many thanks to all the people that provided reviews and comments.

Wei.


Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 07:45:25PM +0100, Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2021 at 7:41 PM Wei Liu  wrote:
> >
> > On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote:
> > > There is already a stub function for pxm_to_node but conversion to the
> > > other direction is missing.
> > >
> > > It will be used by Microsoft Hypervisor code later.
> > >
> > > Signed-off-by: Wei Liu 
> >
> > Hi ACPI maintainers, if you're happy with this patch I can take it via
> > the hyperv-next tree, given the issue is discovered when pxm_to_node is
> > called in our code.
> 
> Yes, you can.

Thanks Rafael. I will add your ack to the patch as well.

Wei.


Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 06:40:55PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Thursday, February 4, 2021 9:57 AM
[...]
> > I've got the following diff to fix both issues. If you're happy with the
> > changes, can you give your Reviewed-by? That saves a round of posting.
> > 
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 0cabc9aece38..fa71db798465 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -1,7 +1,7 @@
> >  // SPDX-License-Identifier: GPL-2.0
> > 
> >  /*
> > - * for Linux to run as the root partition on Microsoft Hypervisor.
> > + * Irqdomain for Linux to run as the root partition on Microsoft 
> > Hypervisor.
> >   *
> >   * Authors:
> >   *  Sunil Muthuswamy 
> > @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, 
> > bool level,
> > struct hv_device_interrupt_descriptor *intr_desc;
> > unsigned long flags;
> > u64 status;
> > -   cpumask_t mask = CPU_MASK_NONE;
> > +   const cpumask_t *mask;
> > int nr_bank, var_size;
> > 
> > local_irq_save(flags);
> > @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id 
> > device_id, bool
> > level,
> > else
> > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > 
> > -   cpumask_set_cpu(cpu, );
> > +   mask = cpumask_of(cpu);
> > intr_desc->target.vp_set.valid_bank_mask = 0;
> > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > -   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), );
> > +   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);
> 
> Can you just do the following and get rid of the 'mask' local entirely?
> 
> nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));

Sure. That can be done.

> 
> Either way,
> 
> Reviewed-by: Michael Kelley 

Thank you.

Wei.


Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

2021-02-04 Thread Wei Liu
On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote:
> There is already a stub function for pxm_to_node but conversion to the
> other direction is missing.
> 
> It will be used by Microsoft Hypervisor code later.
> 
> Signed-off-by: Wei Liu 

Hi ACPI maintainers, if you're happy with this patch I can take it via
the hyperv-next tree, given the issue is discovered when pxm_to_node is
called in our code.

> ---
> v6: new
> ---
>  include/acpi/acpi_numa.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index a4c6ef809e27..40a91ce87e04 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
>  {
>   return 0;
>  }
> +static inline int node_to_pxm(int node)
> +{
> + return 0;
> +}
>  #endif   /* CONFIG_ACPI_NUMA */
>  
>  #ifdef CONFIG_ACPI_HMAT
> -- 
> 2.20.1
> 


Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 05:43:16PM +, Michael Kelley wrote:
[...]
> >  remove_cpuhp_state:
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > new file mode 100644
> > index ..117f17e8c88a
> > --- /dev/null
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -0,0 +1,362 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * for Linux to run as the root partition on Microsoft Hypervisor.
> 
> Nit:  Looks like the initial word "Irqdomain" got dropped from the above
> comment line.  But don't respin just for this.
> 

I've added it back. Thanks.

> > +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > +   int cpu, int vector, struct hv_interrupt_entry *entry)
> > +{
> > +   struct hv_input_map_device_interrupt *input;
> > +   struct hv_output_map_device_interrupt *output;
> > +   struct hv_device_interrupt_descriptor *intr_desc;
> > +   unsigned long flags;
> > +   u64 status;
> > +   cpumask_t mask = CPU_MASK_NONE;
> > +   int nr_bank, var_size;
> > +
> > +   local_irq_save(flags);
> > +
> > +   input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +   output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +
> > +   intr_desc = >interrupt_descriptor;
> > +   memset(input, 0, sizeof(*input));
> > +   input->partition_id = hv_current_partition_id;
> > +   input->device_id = device_id.as_uint64;
> > +   intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > +   intr_desc->vector_count = 1;
> > +   intr_desc->target.vector = vector;
> > +
> > +   if (level)
> > +   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > +   else
> > +   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +
> > +   cpumask_set_cpu(cpu, );
> > +   intr_desc->target.vp_set.valid_bank_mask = 0;
> > +   intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > +   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), );
> 
> There's a function get_cpu_mask() that returns a pointer to a cpumask with 
> only
> the specified cpu set in the mask.  It returns a const pointer to the correct 
> entry
> in a pre-allocated array of all such cpumasks, so it's a lot more efficient 
> than
> allocating and initializing a local cpumask instance on the stack.
> 

That's nice.

I've got the following diff to fix both issues. If you're happy with the
changes, can you give your Reviewed-by? That saves a round of posting.

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 0cabc9aece38..fa71db798465 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0

 /*
- * for Linux to run as the root partition on Microsoft Hypervisor.
+ * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
  *
  * Authors:
  *  Sunil Muthuswamy 
@@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, 
bool level,
struct hv_device_interrupt_descriptor *intr_desc;
unsigned long flags;
u64 status;
-   cpumask_t mask = CPU_MASK_NONE;
+   const cpumask_t *mask;
int nr_bank, var_size;

local_irq_save(flags);
@@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, 
bool level,
else
intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;

-   cpumask_set_cpu(cpu, );
+   mask = cpumask_of(cpu);
intr_desc->target.vp_set.valid_bank_mask = 0;
intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
-   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), );
+   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);
if (nr_bank < 0) {
local_irq_restore(flags);
pr_err("%s: unable to generate VP set\n", __func__);


Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
[...]
> > > > +
> > > > +   if (level)
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > > +   else
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > > +
> > > > +   __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask);
> > > > +
> > > > +   status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, 
> > > > input,
> > output) &
> > > > +HV_HYPERCALL_RESULT_MASK;
> > > > +   local_irq_restore(flags);
> > > > +
> > > > +   *entry = output->interrupt_entry;
> > > > +
> > > > +   return status;
> > >
> > > As a cross-check, I was comparing this code against 
> > > hv_map_msi_interrupt().  They are
> > > mostly parallel, though some of the assignments are done in a different 
> > > order.  It's a nit,
> > > but making them as parallel as possible would be nice. :-)
> > >
> > 
> > Indeed. I will see about factoring out a function.
> 
> If factoring out a separate helper function is clumsy, just having the 
> parallel code
> in the two functions be as similar as possible makes it easier to see what's 
> the
> same and what's different.
> 

No. It is not clumsy at all. I've done it in the newly posted v6.

I was baffled why I wrote hv_unmap_interrupt helper to be used by both
hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous
patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough
coffee that day. :-/

Thanks for pointing out that issue. It definitely helped improve the
quality of this series.

Wei.


Re: [PATCH 08/10] clocksource/drivers/hyper-v: Handle sched_clock differences inline

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 04:28:38PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Monday, February 1, 2021 10:55 AM
> > 
> > On Wed, Jan 27, 2021 at 12:23:43PM -0800, Michael Kelley wrote:
> > [...]
> > > +/*
> > > + * Reference to pv_ops must be inline so objtool
> > > + * detection of noinstr violations can work correctly.
> > > + */
> > > +static __always_inline void hv_setup_sched_clock(void *sched_clock)
> > 
> > sched_clock_register is not trivial. Having __always_inline here is
> > going to make the compiled object bloated.
> > 
> > Given this is a static function, I don't think we need to specify any
> > inline keyword. The compiler should be able to determine whether this
> > function should be inlined all by itself.
> > 
> > Wei.
> 
> There was an explicit request from Peter Zijlstra and Thomas Gleixner
> to force it inline.  See https://lore.kernel.org/patchwork/patch/1283635/ and
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/include/asm/mshyperv.h?id=b9d8cf2eb3ceecdee3434b87763492aee9e28845

Oh. noinstr. I failed to notice the comment. Sorry for the noise.

Wei.


Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue

2021-02-03 Thread Si-Wei Liu
On Tue, Feb 2, 2021 at 9:16 PM Jason Wang  wrote:
>
>
> On 2021/2/3 上午1:54, Si-Wei Liu wrote:
> > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen  wrote:
> >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> >>> Thanks Eli and Jason for clarifications. See inline.
> >>>
> >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen  wrote:
> >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang  wrote:
> >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu  
> >>>>>>>> wrote:
> >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen  wrote:
> >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current 
> >>>>>>>>>> available
> >>>>>>>>>> index. This is done when a change of map occurs when the driver 
> >>>>>>>>>> calls
> >>>>>>>>>> save_channel_info().
> >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> >>>>>>>>> which doesn't save the available index as save_channel_info() 
> >>>>>>>>> doesn't
> >>>>>>>>> get called in that path at all. How does it handle the case that
> >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
> >>>>>>>>> hardware VQ object was torn down, but userspace still wants to 
> >>>>>>>>> access
> >>>>>>>>> the queue index?
> >>>>>>>>>
> >>>>>>>>> Refer to 
> >>>>>>>>> https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei@oracle.com/
> >>>>>>>>>
> >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily 
> >>>>>>>>> unavailable (11)
> >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily 
> >>>>>>>>> unavailable (11)
> >>>>>>>>>
> >>>>>>>>> QEMU will complain with the above warning while VM is being rebooted
> >>>>>>>>> or shut down.
> >>>>>>>>>
> >>>>>>>>> Looks to me either the kernel driver should cover this requirement, 
> >>>>>>>>> or
> >>>>>>>>> the userspace has to bear the burden in saving the index and not 
> >>>>>>>>> call
> >>>>>>>>> into kernel if VQ is destroyed.
> >>>>>>>> Actually, the userspace doesn't have the insights whether virt queue
> >>>>>>>> will be destroyed if just changing the device status via 
> >>>>>>>> set_status().
> >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave 
> >>>>>>>> like
> >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
> >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
> >>>>>>> So I think we can simply drop this patch?
> >>>>>> Yep, I think so. To be honest I don't know why it has anything to do
> >>>>>> with the memory hotplug issue.
> >>>>>
> >>>>> Eli may know more, my understanding is that, during memory hotplut, qemu
> >>>>> need to propagate new memory mappings via set_map(). For mellanox, it 
> >>>>> means
> >>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> >>>>>
> >>>> I think Siwei was asking why the first patch was related to the hotplug
> >>>> issue.
> >>> I was thinking how consistency is assured when saving/restoring this
> >>> h/w avail_index against the one in the virtq memory, particularly in
> >>> the region_add/.region_del() context (e.g. the hotplug case). Problem
> >>> is I don't see explicit memory barrier when guest thread updates the
> >>> avail_index, how does the device make sure the h/w 

Re: [PATCH] vdpa/mlx5: Restore the hardware used index after change map

2021-02-03 Thread Si-Wei Liu
On Tue, Feb 2, 2021 at 10:48 PM Eli Cohen  wrote:
>
> On Tue, Feb 02, 2021 at 09:14:02AM -0800, Si-Wei Liu wrote:
> > On Tue, Feb 2, 2021 at 6:34 AM Eli Cohen  wrote:
> > >
> > > When a change of memory map occurs, the hardware resources are destroyed
> > > and then re-created again with the new memory map. In such case, we need
> > > to restore the hardware available and used indices. The driver failed to
> > > restore the used index which is added here.
> > >
> > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > devices")
> > > Signed-off-by: Eli Cohen 
> > > ---
> > > This patch is being sent again a single patch the fixes hot memory
> > > addtion to a qemy process.
> > >
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 88dde3455bfd..839f57c64a6f 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > u64 device_addr;
> > > u64 driver_addr;
> > > u16 avail_index;
> > > +   u16 used_index;
> > > bool ready;
> > > struct vdpa_callback cb;
> > > bool restore;
> > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > u32 virtq_id;
> > > struct mlx5_vdpa_net *ndev;
> > > u16 avail_idx;
> > > +   u16 used_idx;
> > > int fw_state;
> > >
> > > /* keep last in the struct */
> > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net 
> > > *ndev, struct mlx5_vdpa_virtque
> > >
> > > obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, 
> > > obj_context);
> > > MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, 
> > > mvq->avail_idx);
> > > +   MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, 
> > > mvq->used_idx);
> >
> > The saved indexes will apply to the new virtqueue object whenever it
> > is created. In virtio spec, these indexes will reset back to zero when
> > the virtio device is reset. But I don't see how it's done today. IOW,
> > I don't see where avail_idx and used_idx get cleared from the mvq for
> > device reset via set_status().
> >
>
> Right, but this is not strictly related to this patch. I will post
> another patch to fix this.

Better to post these two patches in a series.Or else it may cause VM
reboot problem as that is where the device gets reset. The avail_index
did not as the correct value will be written to by driver right after,
but used_idx introduced by this patch is supplied by device hence this
patch alone would introduce regression.

>
> BTW, can you describe a secnario that would cause a reset (through
> calling set_status()) that happens after the VQ has been used?

You can try reboot the guest, that'll be the easy way to test.

-Siwei

>
> > -Siwei
> >
> >
> > > MLX5_SET(virtio_net_q_object, obj_context, 
> > > queue_feature_bit_mask_12_3,
> > >  get_features_12_3(ndev->mvdev.actual_features));
> > > vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, 
> > > virtio_q_context);
> > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, 
> > > struct mlx5_vdpa_virtqueue *m
> > >  struct mlx5_virtq_attr {
> > > u8 state;
> > > u16 available_index;
> > > +   u16 used_index;
> > >  };
> > >
> > >  static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> > > mlx5_vdpa_virtqueue *mvq,
> > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net 
> > > *ndev, struct mlx5_vdpa_virtqueu
> > > memset(attr, 0, sizeof(*attr));
> > > attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > > attr->available_index = MLX5_GET(virtio_net_q_object, 
> > > obj_context, hw_available_index);
> > > +   attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, 
> > > hw_used_index);
> > > kfree(out);
> > > return 0;
> > >
> > > @@ -1610,6 +1615,7 @@ static int save_channel_info(struct mlx5_vdpa_net 
> > > *ndev, struct mlx5_vdpa_virtqu
> > > return err;
> > >
> > > ri->avail_index = attr.available_index;
> > > +   ri->used_index = attr.used_index;
> > > ri->ready = mvq->ready;
> > > ri->num_ent = mvq->num_ent;
> > > ri->desc_addr = mvq->desc_addr;
> > > @@ -1654,6 +1660,7 @@ static void restore_channels_info(struct 
> > > mlx5_vdpa_net *ndev)
> > > continue;
> > >
> > > mvq->avail_idx = ri->avail_index;
> > > +   mvq->used_idx = ri->used_index;
> > > mvq->ready = ri->ready;
> > > mvq->num_ent = ri->num_ent;
> > > mvq->desc_addr = ri->desc_addr;
> > > --
> > > 2.29.2
> > >


[PATCH v6 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT

2021-02-03 Thread Wei Liu
This makes the name match Hyper-V TLFS.

Signed-off-by: Wei Liu 
Reviewed-by: Vitaly Kuznetsov 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Michael Kelley 
---
 include/asm-generic/hyperv-tlfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/hyperv-tlfs.h 
b/include/asm-generic/hyperv-tlfs.h
index e73a11850055..e6903589a82a 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -88,7 +88,7 @@
 #define HV_CONNECT_PORTBIT(7)
 #define HV_ACCESS_STATSBIT(8)
 #define HV_DEBUGGING   BIT(11)
-#define HV_CPU_POWER_MANAGEMENTBIT(12)
+#define HV_CPU_MANAGEMENT  BIT(12)
 
 
 /*
-- 
2.20.1



[PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition

2021-02-03 Thread Wei Liu
For now we can use the privilege flag to check. Stash the value to be
used later.

Put in a bunch of defines for future use when we want to have more
fine-grained detection.

Signed-off-by: Wei Liu 
Reviewed-by: Pavel Tatashin 
---
v3: move hv_root_partition to mshyperv.c
---
 arch/x86/include/asm/hyperv-tlfs.h | 10 ++
 arch/x86/include/asm/mshyperv.h|  2 ++
 arch/x86/kernel/cpu/mshyperv.c | 20 
 3 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index 6bf42aed387e..204010350604 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -21,6 +21,7 @@
 #define HYPERV_CPUID_FEATURES  0x4003
 #define HYPERV_CPUID_ENLIGHTMENT_INFO  0x4004
 #define HYPERV_CPUID_IMPLEMENT_LIMITS  0x4005
+#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES   0x4007
 #define HYPERV_CPUID_NESTED_FEATURES   0x400A
 
 #define HYPERV_CPUID_VIRT_STACK_INTERFACE  0x4081
@@ -110,6 +111,15 @@
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDEDBIT(14)
 
+/*
+ * CPU management features identification.
+ * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
+ */
+#define HV_X64_START_LOGICAL_PROCESSOR BIT(0)
+#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR   BIT(1)
+#define HV_X64_PERFORMANCE_COUNTER_SYNCBIT(2)
+#define HV_X64_RESERVED_IDENTITY_BIT   BIT(31)
+
 /*
  * Virtual processor will never share a physical core with another virtual
  * processor, except for virtual processors that are reported as sibling SMT
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..ac2b0d110f03 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
struct hv_guest_mapping_flush_list *flush,
u64 start_gfn, u64 end_gfn);
 
+extern bool hv_root_partition;
+
 #ifdef CONFIG_X86_64
 void hv_apic_init(void);
 void __init hv_init_spinlocks(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f628e3dc150f..c376d191a260 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,10 @@
 #include 
 #include 
 
+/* Is Linux running as the root partition? */
+bool hv_root_partition;
+EXPORT_SYMBOL_GPL(hv_root_partition);
+
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
@@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
 ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
 
+   /*
+* Check CPU management privilege.
+*
+* To mirror what Windows does we should extract CPU management
+* features and use the ReservedIdentityBit to detect if Linux is the
+* root partition. But that requires negotiating CPU management
+* interface (a process to be finalized).
+*
+* For now, use the privilege flag as the indicator for running as
+* root.
+*/
+   if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
+   hv_root_partition = true;
+   pr_info("Hyper-V: running as root partition\n");
+   }
+
/*
 * Extract host information.
 */
-- 
2.20.1



[PATCH v6 04/16] clocksource/hyperv: use MSR-based access if running as root

2021-02-03 Thread Wei Liu
When Linux runs as the root partition, the setup required for TSC page
is different. Luckily Linux also has access to the MSR based
clocksource. We can just disable the TSC page clocksource if Linux is
the root partition.

Signed-off-by: Wei Liu 
Acked-by: Daniel Lezcano 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Michael Kelley 
---
 drivers/clocksource/hyperv_timer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/clocksource/hyperv_timer.c 
b/drivers/clocksource/hyperv_timer.c
index ba04cb381cd3..269a691bd2c4 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
return false;
 
+   if (hv_root_partition)
+   return false;
+
hv_read_reference_counter = read_hv_clock_tsc;
phys_addr = virt_to_phys(hv_get_tsc_page());
 
-- 
2.20.1



[PATCH v6 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root

2021-02-03 Thread Wei Liu
There is no VMBus and the other infrastructures initialized in
hv_acpi_init when Linux is running as the root partition.

Signed-off-by: Wei Liu 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Michael Kelley 
---
v3: Return 0 instead of -ENODEV.
---
 drivers/hv/vmbus_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 502f8cd95f6d..ee27b3670a51 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
if (!hv_is_hyperv_initialized())
return -ENODEV;
 
+   if (hv_root_partition)
+   return 0;
+
init_completion(_event);
 
/*
-- 
2.20.1



[PATCH v6 05/16] x86/hyperv: allocate output arg pages if required

2021-02-03 Thread Wei Liu
When Linux runs as the root partition, it will need to make hypercalls
which return data from the hypervisor.

Allocate pages for storing results when Linux runs as the root
partition.

Signed-off-by: Lillian Grassin-Drake 
Co-Developed-by: Lillian Grassin-Drake 
Signed-off-by: Wei Liu 
---
v3: Fix hv_cpu_die to use free_pages.
v2: Address Vitaly's comments
---
 arch/x86/hyperv/hv_init.c   | 35 -
 arch/x86/include/asm/mshyperv.h |  1 +
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e04d90af4c27..6f4cb40e53fe 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 void  __percpu **hyperv_pcpu_input_arg;
 EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 
+void  __percpu **hyperv_pcpu_output_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
@@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
void **input_arg;
struct page *pg;
 
-   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
-   pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
+   pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
hv_root_partition ? 1 : 0);
if (unlikely(!pg))
return -ENOMEM;
+
+   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
*input_arg = page_address(pg);
+   if (hv_root_partition) {
+   void **output_arg;
+
+   output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+   *output_arg = page_address(pg + 1);
+   }
 
hv_get_vp_index(msr_vp_index);
 
@@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
unsigned int new_cpu;
unsigned long flags;
void **input_arg;
-   void *input_pg = NULL;
+   void *pg;
 
local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-   input_pg = *input_arg;
+   pg = *input_arg;
*input_arg = NULL;
+
+   if (hv_root_partition) {
+   void **output_arg;
+
+   output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+   *output_arg = NULL;
+   }
+
local_irq_restore(flags);
-   free_page((unsigned long)input_pg);
+
+   free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
 
if (hv_vp_assist_page && hv_vp_assist_page[cpu])
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
@@ -346,6 +365,12 @@ void __init hyperv_init(void)
 
BUG_ON(hyperv_pcpu_input_arg == NULL);
 
+   /* Allocate the per-CPU state for output arg for root */
+   if (hv_root_partition) {
+   hyperv_pcpu_output_arg = alloc_percpu(void *);
+   BUG_ON(hyperv_pcpu_output_arg == NULL);
+   }
+
/* Allocate percpu VP index */
hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
GFP_KERNEL);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ac2b0d110f03..62d9390f1ddf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
 #if IS_ENABLED(CONFIG_HYPERV)
 extern void *hv_hypercall_pg;
 extern void  __percpu  **hyperv_pcpu_input_arg;
+extern void  __percpu  **hyperv_pcpu_output_arg;
 
 static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 {
-- 
2.20.1



[PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

2021-02-03 Thread Wei Liu
There is already a stub function for pxm_to_node but conversion to the
other direction is missing.

It will be used by Microsoft Hypervisor code later.

Signed-off-by: Wei Liu 
---
v6: new
---
 include/acpi/acpi_numa.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index a4c6ef809e27..40a91ce87e04 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
 {
return 0;
 }
+static inline int node_to_pxm(int node)
+{
+   return 0;
+}
 #endif /* CONFIG_ACPI_NUMA */
 
 #ifdef CONFIG_ACPI_HMAT
-- 
2.20.1



[PATCH v6 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus

2021-02-03 Thread Wei Liu
Microsoft Hypervisor requires the root partition to make a few
hypercalls to setup application processors before they can be used.

Signed-off-by: Lillian Grassin-Drake 
Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Lillian Grassin-Drake 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Wei Liu 
Reviewed-by: Michael Kelley 
---
CPU hotplug and unplug is not yet supported in this setup, so those
paths remain untouched.

v3: Always call native SMP preparation function.
---
 arch/x86/kernel/cpu/mshyperv.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c376d191a260..13d3b6dd21a3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Is Linux running as the root partition? */
 bool hv_root_partition;
@@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void)
hv_init_spinlocks();
 #endif
 }
+
+static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
+{
+#ifdef CONFIG_X86_64
+   int i;
+   int ret;
+#endif
+
+   native_smp_prepare_cpus(max_cpus);
+
+#ifdef CONFIG_X86_64
+   for_each_present_cpu(i) {
+   if (i == 0)
+   continue;
+   ret = hv_call_add_logical_proc(numa_cpu_node(i), i, 
cpu_physical_id(i));
+   BUG_ON(ret);
+   }
+
+   for_each_present_cpu(i) {
+   if (i == 0)
+   continue;
+   ret = hv_call_create_vp(numa_cpu_node(i), 
hv_current_partition_id, i, i);
+   BUG_ON(ret);
+   }
+#endif
+}
 #endif
 
 static void __init ms_hyperv_init_platform(void)
@@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
+   if (hv_root_partition)
+   smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
 # endif
 
/*
-- 
2.20.1



[PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions

2021-02-03 Thread Wei Liu
They are used to deposit pages into Microsoft Hypervisor and bring up
logical and virtual processors.

Signed-off-by: Lillian Grassin-Drake 
Signed-off-by: Sunil Muthuswamy 
Signed-off-by: Nuno Das Neves 
Co-Developed-by: Lillian Grassin-Drake 
Co-Developed-by: Sunil Muthuswamy 
Co-Developed-by: Nuno Das Neves 
Signed-off-by: Wei Liu 
---
v6:
1. Address Michael's comments.

v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.

v3:
1. Add __packed to structures.
2. Drop unnecessary exports.

v2:
1. Adapt to hypervisor side changes
2. Address Vitaly's comments

use u64 status

pages

major comments

minor comments

rely on acpi code
---
 arch/x86/hyperv/Makefile  |   2 +-
 arch/x86/hyperv/hv_proc.c | 219 ++
 arch/x86/include/asm/mshyperv.h   |   4 +
 include/asm-generic/hyperv-tlfs.h |  67 +
 4 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/hyperv/hv_proc.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 89b1f74d3225..565358020921 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y  := hv_init.o mmu.o nested.o
-obj-$(CONFIG_X86_64)   += hv_apic.o
+obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)   += hv_spinlock.o
diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
new file mode 100644
index ..60461e598239
--- /dev/null
+++ b/arch/x86/hyperv/hv_proc.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/*
+ * See struct hv_deposit_memory. The first u64 is partition ID, the rest
+ * are GPAs.
+ */
+#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
+
+/* Deposits exact number of pages. Must be called with interrupts enabled.  */
+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
+{
+   struct page **pages, *page;
+   int *counts;
+   int num_allocations;
+   int i, j, page_count;
+   int order;
+   u64 status;
+   int ret;
+   u64 base_pfn;
+   struct hv_deposit_memory *input_page;
+   unsigned long flags;
+
+   if (num_pages > HV_DEPOSIT_MAX)
+   return -E2BIG;
+   if (!num_pages)
+   return 0;
+
+   /* One buffer for page pointers and counts */
+   page = alloc_page(GFP_KERNEL);
+   if (!page)
+   return -ENOMEM;
+   pages = page_address(page);
+
+   counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
+   if (!counts) {
+   free_page((unsigned long)pages);
+   return -ENOMEM;
+   }
+
+   /* Allocate all the pages before disabling interrupts */
+   i = 0;
+
+   while (num_pages) {
+   /* Find highest order we can actually allocate */
+   order = 31 - __builtin_clz(num_pages);
+
+   while (1) {
+   pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
+   if (pages[i])
+   break;
+   if (!order) {
+   ret = -ENOMEM;
+   num_allocations = i;
+   goto err_free_allocations;
+   }
+   --order;
+   }
+
+   split_page(pages[i], order);
+   counts[i] = 1 << order;
+   num_pages -= counts[i];
+   i++;
+   }
+   num_allocations = i;
+
+   local_irq_save(flags);
+
+   input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+   input_page->partition_id = partition_id;
+
+   /* Populate gpa_page_list - these will fit on the input page */
+   for (i = 0, page_count = 0; i < num_allocations; ++i) {
+   base_pfn = page_to_pfn(pages[i]);
+   for (j = 0; j < counts[i]; ++j, ++page_count)
+   input_page->gpa_page_list[page_count] = base_pfn + j;
+   }
+   status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
+page_count, 0, input_page, NULL);
+   local_irq_restore(flags);
+
+   if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
+   pr_err("Failed to deposit pages: %lld\n", status);
+   ret = status;
+   goto err_free_allocations;
+   }
+
+   ret = 0;
+   goto free_buf;
+
+err_free_allocations:
+   for (i = 0; i < num_allocations; ++i) {
+   base_pfn = page_to_pfn(pages[i]);
+   for (j = 0; j < counts[i]; ++j)
+   __free_page(pfn_to_page(base_pfn + j));
+   }
+
+free_buf:
+   free_page((unsigned long)pages);
+   kfr

[PATCH v6 12/16] asm-generic/hyperv: update hv_interrupt_entry

2021-02-03 Thread Wei Liu
We will soon use the same structure to handle IO-APIC interrupts as
well. Introduce an enum to identify the source and a data structure for
IO-APIC RTE.

While at it, update pci-hyperv.c to use the enum.

No functional change.

Signed-off-by: Wei Liu 
Acked-by: Rob Herring 
Reviewed-by: Michael Kelley 
---
 drivers/pci/controller/pci-hyperv.c |  2 +-
 include/asm-generic/hyperv-tlfs.h   | 36 +++--
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 6db8d96a78eb..87aa62ee0368 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data)
params = >retarget_msi_interrupt_params;
memset(params, 0, sizeof(*params));
params->partition_id = HV_PARTITION_ID_SELF;
-   params->int_entry.source = 1; /* MSI(-X) */
+   params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
hv_set_msi_entry_from_desc(>int_entry.msi_entry, msi_desc);
params->device_id = (hbus->hdev->dev_instance.b[5] << 24) |
   (hbus->hdev->dev_instance.b[4] << 16) |
diff --git a/include/asm-generic/hyperv-tlfs.h 
b/include/asm-generic/hyperv-tlfs.h
index 4669f9a4e1f1..94c7d77bbf68 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -480,6 +480,11 @@ struct hv_create_vp {
u64 flags;
 } __packed;
 
+enum hv_interrupt_source {
+   HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
+   HV_INTERRUPT_SOURCE_IOAPIC,
+};
+
 union hv_msi_address_register {
u32 as_uint32;
struct {
@@ -513,10 +518,37 @@ union hv_msi_entry {
} __packed;
 };
 
+union hv_ioapic_rte {
+   u64 as_uint64;
+
+   struct {
+   u32 vector:8;
+   u32 delivery_mode:3;
+   u32 destination_mode:1;
+   u32 delivery_status:1;
+   u32 interrupt_polarity:1;
+   u32 remote_irr:1;
+   u32 trigger_mode:1;
+   u32 interrupt_mask:1;
+   u32 reserved1:15;
+
+   u32 reserved2:24;
+   u32 destination_id:8;
+   };
+
+   struct {
+   u32 low_uint32;
+   u32 high_uint32;
+   };
+} __packed;
+
 struct hv_interrupt_entry {
-   u32 source; /* 1 for MSI(-X) */
+   u32 source;
u32 reserved1;
-   union hv_msi_entry msi_entry;
+   union {
+   union hv_msi_entry msi_entry;
+   union hv_ioapic_rte ioapic_rte;
+   };
 } __packed;
 
 /*
-- 
2.20.1



  1   2   3   4   5   6   7   8   >