Re: [PATCH 1/4] vDPA: allow userspace to query features of a vDPA device

2022-09-20 Thread Jason Wang
On Tue, Sep 20, 2022 at 5:58 PM Zhu, Lingshan  wrote:
>
>
>
> On 9/20/2022 10:02 AM, Jason Wang wrote:
> > On Fri, Sep 9, 2022 at 5:05 PM Zhu Lingshan  wrote:
> >> This commit adds a new vDPA netlink attribution
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> >> features of vDPA devices through this new attr.
> >>
> >> This commit invokes vdpa_config_ops.get_config() than
> >> vdpa_get_config_unlocked() to read the device config
> >> spcae, so no raeces in vdpa_set_features_unlocked()
> >>
> >> Signed-off-by: Zhu Lingshan 
> > It's better to share the userspace code as well.
> OK, will share it in V2.
> >
> >> ---
> >>   drivers/vdpa/vdpa.c   | 19 ++-
> >>   include/uapi/linux/vdpa.h |  4 
> >>   2 files changed, 18 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >> index c06c02704461..798a02c7aa94 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -491,6 +491,8 @@ static int vdpa_mgmtdev_fill(const struct 
> >> vdpa_mgmt_dev *mdev, struct sk_buff *m
> >>  err = -EMSGSIZE;
> >>  goto msg_err;
> >>  }
> >> +
> >> +   /* report features of a vDPA management device through 
> >> VDPA_ATTR_DEV_SUPPORTED_FEATURES */
> > The code explains itself, there's no need for the comment.
> these comments are required in other discussions

I think it's more than sufficient to clarify the semantic where it is defined.

> >
> >>  if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_SUPPORTED_FEATURES,
> >>mdev->supported_features, VDPA_ATTR_PAD)) {
> >>  err = -EMSGSIZE;
> >> @@ -815,10 +817,10 @@ static int vdpa_dev_net_mq_config_fill(struct 
> >> vdpa_device *vdev,
> >>   static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct 
> >> sk_buff *msg)
> >>   {
> >>  struct virtio_net_config config = {};
> >> -   u64 features;
> >> +   u64 features_device, features_driver;
> >>  u16 val_u16;
> >>
> >> -   vdpa_get_config_unlocked(vdev, 0, , sizeof(config));
> >> +   vdev->config->get_config(vdev, 0, , sizeof(config));
> >>
> >>  if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, 
> >> sizeof(config.mac),
> >>  config.mac))
> >> @@ -832,12 +834,19 @@ static int vdpa_dev_net_config_fill(struct 
> >> vdpa_device *vdev, struct sk_buff *ms
> >>  if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
> >>  return -EMSGSIZE;
> >>
> >> -   features = vdev->config->get_driver_features(vdev);
> >> -   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, 
> >> features,
> >> +   features_driver = vdev->config->get_driver_features(vdev);
> >> +   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, 
> >> features_driver,
> >> + VDPA_ATTR_PAD))
> >> +   return -EMSGSIZE;
> >> +
> >> +   features_device = vdev->config->get_device_features(vdev);
> >> +
> >> +   /* report features of a vDPA device through 
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES */
> >> +   if (nla_put_u64_64bit(msg, VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, 
> >> features_device,
> >>VDPA_ATTR_PAD))
> >>  return -EMSGSIZE;
> >>
> >> -   return vdpa_dev_net_mq_config_fill(vdev, msg, features, );
> >> +   return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver, 
> >> );
> >>   }
> >>
> >>   static int
> >> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> >> index 25c55cab3d7c..97531b52dcbe 100644
> >> --- a/include/uapi/linux/vdpa.h
> >> +++ b/include/uapi/linux/vdpa.h
> >> @@ -46,12 +46,16 @@ enum vdpa_attr {
> >>
> >>  VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
> >>  VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /* u32 */
> >> +   /* features of a vDPA management device */
> >>  VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
> >>
> >>  VDPA_ATTR_DEV_QUEUE_INDEX,  /* u32 */
> >>  VDPA_ATTR_DEV_VENDOR_ATTR_NAME, /* string */
> >>  VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,/* u64 */
> >>
> >> +   /* features of a vDPA device, e.g., /dev/vhost-vdpa0 */
> >> +   VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,  /* u64 */
> > What's the difference between this and VDPA_ATTR_DEV_SUPPORTED_FEATURES?
> This is to report a vDPA device features, and
> VDPA_ATTR_DEV_SUPPORTED_FEATURES
> is used for reporting the management device features, we have a long
> discussion
> on this before.

Yes, but the comment is not clear in many ways:

" features of a vDPA management device" sounds like features that is
out of the scope of the virtio.

And

"/dev/vhost-vdpa0" is not a vDPA device but a vhost-vDPA device.

Thanks

> >
> > Thanks
> >
> >> +
> >>  /* new attributes must be added above here */
> >>  VDPA_ATTR_MAX,
> >>   };
> >> --
> >> 2.31.1

Re: [PATCH 2/4] vDPA: only report driver features if FEATURES_OK is set

2022-09-20 Thread Jason Wang
On Tue, Sep 20, 2022 at 1:46 PM Zhu, Lingshan  wrote:
>
>
>
> On 9/20/2022 10:16 AM, Jason Wang wrote:
> > On Fri, Sep 9, 2022 at 5:05 PM Zhu Lingshan  wrote:
> >> vdpa_dev_net_config_fill() should only report driver features
> >> to userspace after features negotiation is done.
> >>
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >>   drivers/vdpa/vdpa.c | 13 +
> >>   1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
> >> index 798a02c7aa94..29d7e8858e6f 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -819,6 +819,7 @@ static int vdpa_dev_net_config_fill(struct vdpa_device 
> >> *vdev, struct sk_buff *ms
> >>  struct virtio_net_config config = {};
> >>  u64 features_device, features_driver;
> >>  u16 val_u16;
> >> +   u8 status;
> >>
> >>  vdev->config->get_config(vdev, 0, , sizeof(config));
> >>
> >> @@ -834,10 +835,14 @@ static int vdpa_dev_net_config_fill(struct 
> >> vdpa_device *vdev, struct sk_buff *ms
> >>  if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
> >>  return -EMSGSIZE;
> >>
> >> -   features_driver = vdev->config->get_driver_features(vdev);
> >> -   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, 
> >> features_driver,
> >> - VDPA_ATTR_PAD))
> >> -   return -EMSGSIZE;
> >> +   /* only read driver features after the feature negotiation is done 
> >> */
> >> +   status = vdev->config->get_status(vdev);
> >> +   if (status & VIRTIO_CONFIG_S_FEATURES_OK) {
> > Any reason this is not checked in its caller as what it used to do before?
> will check the existence of vdev->config->get_status before calling it in V2

Just to clarify, I meant to check FEAUTRES_OK in the caller -
vdpa_dev_config_fill() otherwise each type needs to repeat this in
their specific codes.

Thanks

>
> Thanks,
> Zhu Lingshan
> >
> > Thanks
> >
> >> +   features_driver = vdev->config->get_driver_features(vdev);
> >> +   if (nla_put_u64_64bit(msg, 
> >> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
> >> + VDPA_ATTR_PAD))
> >> +   return -EMSGSIZE;
> >> +   }
> >>
> >>  features_device = vdev->config->get_device_features(vdev);
> >>
> >> --
> >> 2.31.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 0/3] vdpa: device feature provisioning

2022-09-20 Thread Parav Pandit via Virtualization
Hi Jason,

> From: Jason Wang 
> Sent: Thursday, September 15, 2022 4:51 AM
> To: m...@redhat.com; jasow...@redhat.com
> Cc: Eli Cohen ; si-wei@oracle.com; Parav Pandit
> ; wuzongy...@linux.alibaba.com;
> virtualization@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> epere...@redhat.com; lingshan@intel.com; gda...@xilinx.com;
> l...@redhat.com; xieyon...@bytedance.com
> Subject: [PATCH 0/3] vdpa: device feature provisioning
> 
> Hi All:
> 
> Virtio features are neogiated between the device and the drivers. This allows
> the mediation layer like vDPA to hide some features from the driver to
> faciliate the cross vendor live migration:
> 
> vDPA on the source supports feature set X vDPA on the destination supports
> feature set Y
> 
> Management can simply provision the vDPA instance with features X on
> both source and destination to let the vDPA can be migrate-able between
> the two vDPA devies with different features support.
> 
> This series tries to allow the device features to be provisioned via netlink 
> to
> achieve this.
Very useful series.
Can you please add vdpa example command with and without -jp option in each of 
the commit logs?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_blk: add SECURE ERASE command support

2022-09-20 Thread Stefan Hajnoczi
On Mon, Sep 19, 2022 at 09:09:05PM +0300, Alvaro Karsz wrote:
> Thanks for the reply.
> 
> > This can be simplified with min_not_zero().
> 
> Ok, I will do it in the next version.
> 
> > It's worth including a comment here that the discard and secure erase
> > limits are combined because the Linux block layer only has one limit
> > value. If the block layer supported independent limit values we wouldn't
> > need to do this.
> 
> Ok.
> 
> I'll send a new version once we agree on the max_secure_erase_seg = 0 
> scenario.
> Do you have an opinion on that?
> Do you think that using sg_elems as the number of secure erase/discard
> segments when the value in the virtio config is 0 is good enough?
> 

Okay, I have replied in the max_secure_erase_seg sub-thread. I think
probing the device should fail if the value is 0. There are no existing
non-compliant devices that we need to be compatible with - let's
encourage device implementors to report usable max_secure_erase_seg
values.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] virtio_blk: add SECURE ERASE command support

2022-09-20 Thread Stefan Hajnoczi
On Sun, Sep 18, 2022 at 07:07:34PM +0300, Alvaro Karsz wrote:
> > sounds good. Add a code comment?
> 
> I will.
> 
> >  yes but I now see two places that seem to include this logic.
> 
> 
> Yes, this is because the same logic is applied on 2 different pairs.
> 
> * secure_erase_sector_alignment and discard_sector_alignment are used
> to calculate  q->limits.discard_granularity.
> * max_discard_seg and max_secure_erase_seg are used to calculate
> max_discard_segments.
> 
> > I am not 100% sure. Two options:
> > 1- Add a validate callback and clear VIRTIO_BLK_F_SECURE_ERASE.
> > 2- Alternatively, fail probe.
> 
> 
> Good ideas.
> 2- Do you think that something like that should be mentioned in the
> spec? or should be implementation specific?
> 
> How about setting the value to 1? (which is the minimum usable value)
> 
> > which is preferable depends on how bad is it if host sets
> > VIRTIO_BLK_F_SECURE_ERASE but guest does not use it.
> 
> 
> I'm not sure if it is that bad.
> If the value is 0, sg_elems is used.
> sg_elems is either 1 (if VIRTIO_BLK_F_SEG_MAX is not negotiated), or
> seg_max (virtio config).
> 
> """
> err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SEG_MAX,
>   struct virtio_blk_config, seg_max,
>   _elems);
> /* We need at least one SG element, whatever they say. */
> if (err || !sg_elems)
>  sg_elems = 1;
> """
> 
> So the only "danger" that I can think of is if a device negotiates
> VIRTIO_BLK_F_SEG_MAX and  VIRTIO_BLK_F_SECURE_ERASE, sets
> max_secure_erase_seg to 0 (I'm not sure what is the purpose, since
> this is meaningless), and can't handle secure erase commands with
> seg_max segments.

Given that SECURE ERASE is new and the VIRTIO spec does not define
special behavior for 0, I think the virtio_blk driver should be strict.

There's no need to work around existing broken devices. I would fail
probing the device. This will encourage device implementors to provide a
usable value instead of 0.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 00/44] cpuidle,rcu: Clean up the mess

2022-09-20 Thread Peter Zijlstra


Because Nadav asked about tracing/kprobing idle, I had another go around
and noticed not all functions calling ct_cpuidle_enter are __cpuidle.

Basically all cpuidle_driver::enter functions should be __cpuidle; i'll
do that audit shortly.

For now this is ct_cpuidle_enter / CPU_IDLE_ENTER users.

---
--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -17,8 +17,8 @@
 static int num_idle_cpus = 0;
 static DEFINE_RAW_SPINLOCK(cpuidle_lock);
 
-static int imx6q_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx6q_enter_wait(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
 {
raw_spin_lock(_lock);
if (++num_idle_cpus == num_online_cpus())
--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -30,8 +30,8 @@ static int imx6sx_idle_finish(unsigned l
return 0;
 }
 
-static int imx6sx_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx6sx_enter_wait(struct cpuidle_device *dev,
+  struct cpuidle_driver *drv, int index)
 {
imx6_set_lpm(WAIT_UNCLOCKED);
 
--- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c
+++ b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
@@ -224,8 +224,8 @@ static void __init save_l2x0_context(voi
  * 2 - CPUx L1 and logic lost + GIC lost: MPUSS OSWR
  * 3 - CPUx L1 and logic lost + GIC + L2 lost: DEVICE OFF
  */
-int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
-bool rcuidle)
+__cpuidle int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+  bool rcuidle)
 {
struct omap4_cpu_pm_info *pm_info = _cpu(omap4_pm_info, cpu);
unsigned int save_state = 0, cpu_logic_state = PWRDM_POWER_RET;
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -175,7 +175,7 @@ static int omap34xx_do_sram_idle(unsigne
return 0;
 }
 
-void omap_sram_idle(bool rcuidle)
+__cpuidle void omap_sram_idle(bool rcuidle)
 {
/* Variable to tell what needs to be saved and restored
 * in omap_sram_idle*/
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -62,7 +62,7 @@ int acpi_processor_ffh_lpi_probe(unsigne
return psci_acpi_cpu_init_idle(cpu);
 }
 
-int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
+__cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
 {
u32 state = lpi->address;
 
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -31,8 +31,8 @@
  * Called from the CPUidle framework to program the device to the
  * specified target state selected by the governor.
  */
-static int arm_enter_idle_state(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int idx)
+static __cpuidle int arm_enter_idle_state(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
 {
/*
 * Pass idle state index to arm_cpuidle_suspend which in turn
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -122,8 +122,8 @@ static int notrace bl_powerdown_finisher
  * Called from the CPUidle framework to program the device to the
  * specified target state selected by the governor.
  */
-static int bl_enter_powerdown(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int idx)
+static __cpuidle int bl_enter_powerdown(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
ct_cpuidle_enter();
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -25,9 +25,9 @@
 
 static int (*mvebu_v7_cpu_suspend)(int);
 
-static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv,
-   int index)
+static __cpuidle int mvebu_v7_enter_idle(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
 {
int ret;
bool deepidle = false;
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -49,14 +49,9 @@ static inline u32 psci_get_domain_state(
return __this_cpu_read(domain_state);
 }
 
-static inline int psci_enter_state(int idx, u32 state)
-{
-   return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state);
-}
-
-static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int idx,
- bool s2idle)
+static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device 

Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage

2022-09-20 Thread Christoph Hellwig
On Tue, Sep 20, 2022 at 08:17:10PM +1000, Daniil Lunev wrote:
> to WRITE ZERO command in NVMe, but to WRITE UNAVAILABLE in

There is no such thing as WRITE UNAVAILABLE in NVMe.

> NVME 2.0 spec, and to UNMAP ANCHORED in SCSI spec.

The SCSI anchored LBA state is quite complicated, and in addition
to UNMAP you can also create it using WRITE SAME, which is at least
partially useful, as it allows for sensible initialization pattern.
For the purpose of Linux that woud be 0.

That being siad you still haven't actually explained what problem
you're even trying to solve.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 08/44] cpuidle,imx6: Push RCU-idle into driver

2022-09-20 Thread Peter Zijlstra
On Mon, Sep 19, 2022 at 04:21:23PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 11:59:47AM +0200, Peter Zijlstra wrote:
> > Doing RCU-idle outside the driver, only to then temporarily enable it
> > again, at least twice, before going idle is daft.
> 
> Hmm, what ends up calling RCU_IDLE() here? Also what about
> cpu_do_idle()?

I've ammended patches 5-12 with a comment like:

Notably both cpu_pm_enter() and cpu_cluster_pm_enter() implicity
re-enable RCU.

(each noting the specific sites for the relevant patch).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 03/44] cpuidle/poll: Ensure IRQ state is invariant

2022-09-20 Thread Peter Zijlstra
On Mon, Sep 19, 2022 at 03:19:27PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 11:59:42AM +0200, Peter Zijlstra wrote:
> > cpuidle_state::enter() methods should be IRQ invariant
> 
> Got a bit confused with the invariant thing since the first chunck I
> see in this patch is a conversion to an non-traceable local_irq_enable().
> 
> Maybe just add a short mention about that and why?

Changelog now reads:

---
Subject: cpuidle/poll: Ensure IRQ state is invariant
From: Peter Zijlstra 
Date: Tue May 31 15:43:32 CEST 2022

cpuidle_state::enter() methods should be IRQ invariant.

Additionally make sure to use raw_local_irq_*() methods since this
cpuidle callback will be called with RCU already disabled.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Rafael J. Wysocki 
---
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] MAINTAINERS: Update entries for some VMware drivers

2022-09-20 Thread Stefano Garzarella

On Mon, Sep 19, 2022 at 10:41:47AM -0700, Jakub Kicinski wrote:

On Tue,  6 Sep 2022 10:27:19 -0700 vd...@vmware.com wrote:

From: Vishnu Dasa 

This series updates a few existing maintainer entries for VMware
supported drivers and adds a new entry for vsock vmci transport
driver.


Just to be sure - who are you expecting to take these in?



FYI Greg already queued this series in his char-misc-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/log/?h=char-misc-next

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION

2022-09-20 Thread Christoph Hellwig
On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote:
> From: Sarthak Kukreti 
> 
> FALLOC_FL_PROVISION is a new fallocate() allocation mode that
> sends a hint to (supported) thinly provisioned block devices to
> allocate space for the given range of sectors via REQ_OP_PROVISION.

So, how does that "provisioning" actually work in todays world where
storage is usually doing out of place writes in one or more layers,
including the flash storage everyone is using.  Does it give you one
write?  And unlimited number?  Some undecided number inbetween?  How
is it affected by write zeroes to that range or a discard?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage

2022-09-20 Thread Christoph Hellwig
On Fri, Sep 16, 2022 at 11:48:34AM -0700, Sarthak Kukreti wrote:
> Yes. On ChromiumOS, we regularly deal with storage devices that don't
> support WRITE_ZEROES or that need to have it disabled, via a quirk,
> due to a bug in the vendor's implementation.

So bloody punich the vendors for it.  Unlike most of the Linux community
your actually have purchasing power and you'd help everyone by making
use of that instead adding hacks to upstream.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 6/6] iommu: Propagate ret for a potential soft failure EINVAL

2022-09-20 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, September 15, 2022 3:59 PM
> 
> Following the new rules in include/linux/iommu.h kdocs, EINVAL now can be
> used to indicate that domain and device are incompatible by a caller that
> treats it as a soft failure and tries attaching to another domain.
> 
> Either mtk_iommu or virtio driver has a place that returns a hard failure
> instead of the return value from the function call, where an incompatible
> errno EINVAL could potentially occur.

in both cases there is no EINVAL returned from the calling stack

IMHO error propagation is the right way even w/o talking about EINVAL
otherwise we may miss ENOMEM etc.

> 
> Propagate the real return value to not miss a potential soft failure.
> 
> Signed-off-by: Nicolin Chen 

Apart from that comment,

Reviewed-by: Kevin Tian 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 5/6] iommu: Use EINVAL for incompatible device/domain in ->attach_dev

2022-09-20 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, September 15, 2022 3:54 PM
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 1f2cd43cf9bc..51ef42b1bd4e 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4158,19 +4158,15 @@ static int prepare_domain_attach_device(struct
> iommu_domain *domain,
>   return -ENODEV;
> 
>   if (dmar_domain->force_snooping && !ecap_sc_support(iommu-
> >ecap))
> - return -EOPNOTSUPP;
> + return -EINVAL;
> 
>   /* check if this iommu agaw is sufficient for max mapped address */
>   addr_width = agaw_to_width(iommu->agaw);
>   if (addr_width > cap_mgaw(iommu->cap))
>   addr_width = cap_mgaw(iommu->cap);
> 
> - if (dmar_domain->max_addr > (1LL << addr_width)) {
> - dev_err(dev, "%s: iommu width (%d) is not "
> - "sufficient for the mapped address (%llx)\n",
> - __func__, addr_width, dmar_domain->max_addr);
> - return -EFAULT;
> - }
> + if (dmar_domain->max_addr > (1LL << addr_width))
> + return -EINVAL;
>   dmar_domain->gaw = addr_width;
> 
>   /*

Above lacks of a conversion in intel-iommu:

intel_iommu_attach_device()
if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
device_is_rmrr_locked(dev)) {
dev_warn(dev, "Device is ineligible for IOMMU domain attach due 
to platform RMRR requirement.  Contact your platform vendor.\n");
return -EPERM;
}

since it's based on the domain type, picking a different domain
may work in theory though it won't apply to vfio which always
creates unmanaged type.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 3/6] iommu: Add return value rules to attach_dev op and APIs

2022-09-20 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, September 15, 2022 3:54 PM
> 
> +/**
> + * iommu_attach_device - Attach a device to an IOMMU domain
> + * @domain: IOMMU domain to attach
> + * @dev: Device that will be attached
> + *
> + * Returns 0 on success and error code on failure
> + *
> + * Note that EINVAL may be returned as a soft failure if the domain and
> device
> + * are incompatible: if the domain has already been used or configured in
> some

I didn't get the meaning of the 'if' part.

> + * way, attaching the same device to a different domain may succeed.
> Otherwise,
> + * it may still represent some fundamental problem.

I'm not sure what the sentence after 'otherwise' actually adds to the
caller. There is no way to differentiate incompatibility vs. fundamental
problem, hence pointless for the caller to know this fact.

IMHO just state that the caller can treat -EINVAL as soft failure indicating
incompatibility issue between domain and device.

Later for @attach_dev you can add that driver may return (but not
recommend) -EINVAL for some fundamental problems.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization