Re: [PATCH v2] jailhouse: Hold reference returned from of_find_xxx API

2022-09-16 Thread Srivatsa S. Bhat
[ Adding author and reviewers of commit 63338a38db95 again ]

On 9/16/22 2:00 AM, Liang He wrote:
> In jailhouse_paravirt(), we should hold the reference returned from
> of_find_compatible_node() which has increased the refcount and then
> call of_node_put() with it when done.
> 
> Fixes: 63338a38db95 ("jailhouse: Provide detection for non-x86 systems")
> Signed-off-by: Liang He 
> Co-developed-by: Kelin Wang 
> Signed-off-by: Kelin Wang 

Reviewed-by: Srivatsa S. Bhat (VMware) 

> ---
> 
>  v2: use proper return type not the 'np' pointer
> 
>  include/linux/hypervisor.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
> index 9efbc54e35e5..f11eec57ea63 100644
> --- a/include/linux/hypervisor.h
> +++ b/include/linux/hypervisor.h
> @@ -27,7 +27,11 @@ static inline void hypervisor_pin_vcpu(int cpu)
>  
>  static inline bool jailhouse_paravirt(void)
>  {
> - return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
> + struct device_node *np = of_find_compatible_node(NULL, NULL, 
> "jailhouse,cell");
> +
> + of_node_put(np);
> +
> + return np ? true : false;
>  }
>  
>  #endif /* !CONFIG_X86 */
>

Regards,
Srivatsa
VMware Photon OS
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage

2022-09-16 Thread Bart Van Assche

On 9/16/22 11:48, Sarthak Kukreti wrote:

Yes. On ChromiumOS, we regularly deal with storage devices that don't
support WRITE_ZEROES or that need to have it disabled, via a quirk,
due to a bug in the vendor's implementation. Using WRITE_ZEROES for
allocation makes the allocation path quite slow for such devices (not
to mention the effect on storage lifetime), so having a separate
provisioning construct is very appealing. Even for devices that do
support an efficient WRITE_ZEROES implementation but don't support
logical provisioning per-se, I suppose that the allocation path might
be a bit faster (the device driver's request queue would report
'max_provision_sectors'=0 and the request would be short circuited
there) although I haven't benchmarked the difference.


Some background information about why ChromiumOS uses thin provisioning 
instead of a single filesystem across the entire storage device would be 
welcome. Although UFS devices support thin provisioning I am not aware 
of any use cases in Android that would benefit from UFS thin 
provisioning support.


Thanks,

Bart.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] dt-bindings: virtio: Convert virtio,pci-iommu to DT schema

2022-09-16 Thread Jean-Philippe Brucker
Convert the binding that describes the virtio-pci based IOMMU to DT
schema. Change the compatible string to "pci,", which is
defined by the PCI Bus Binding, but keep "virtio,pci-iommu" as an option
for backward compatibility.

Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/iommu.txt  | 66 --
 .../devicetree/bindings/virtio/iommu.yaml | 86 +++
 2 files changed, 86 insertions(+), 66 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.yaml

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
deleted file mode 100644
index 2407fea0651c..
--- a/Documentation/devicetree/bindings/virtio/iommu.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-* virtio IOMMU PCI device
-
-When virtio-iommu uses the PCI transport, its programming interface is
-discovered dynamically by the PCI probing infrastructure. However the
-device tree statically describes the relation between IOMMU and DMA
-masters. Therefore, the PCI root complex that hosts the virtio-iommu
-contains a child node representing the IOMMU device explicitly.
-
-Required properties:
-
-- compatible:  Should be "virtio,pci-iommu"
-- reg: PCI address of the IOMMU. As defined in the PCI Bus
-   Binding reference [1], the reg property is a five-cell
-   address encoded as (phys.hi phys.mid phys.lo size.hi
-   size.lo). phys.hi should contain the device's BDF as
-   0b  dfff . The other cells
-   should be zero.
-- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
-   an endpoint ID, described by the "iommus" property [2].
-   For virtio-iommu, #iommu-cells must be 1.
-
-Notes:
-
-- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
-  virtio-iommu node doesn't have an "iommus" property, and is omitted from
-  the iommu-map property of the root complex.
-
-Example:
-
-pcie@1000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-
-   /* The IOMMU programming interface uses slot 00:01.0 */
-   iommu0: iommu@0008 {
-   compatible = "virtio,pci-iommu";
-   reg = <0x0800 0 0 0 0>;
-   #iommu-cells = <1>;
-   };
-
-   /*
-* The IOMMU manages all functions in this PCI domain except
-* itself. Omit BDF 00:01.0.
-*/
-   iommu-map = <0x0  0x0 0x8>
-   <0x9  0x9 0xfff7>;
-};
-
-pcie@2000 {
-   compatible = "pci-host-ecam-generic";
-   ...
-   /*
-* The IOMMU also manages all functions from this domain,
-* with endpoint IDs 0x1 - 0x1
-*/
-   iommu-map = <0x0  0x1 0x1>;
-};
-
-ethernet@fe001000 {
-   ...
-   /* The IOMMU manages this platform device with endpoint ID 0x2 */
-   iommus = < 0x2>;
-};
-
-[1] Documentation/devicetree/bindings/pci/pci.txt
-[2] Documentation/devicetree/bindings/iommu/iommu.txt
diff --git a/Documentation/devicetree/bindings/virtio/iommu.yaml 
b/Documentation/devicetree/bindings/virtio/iommu.yaml
new file mode 100644
index ..d5bbb8ab9603
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/iommu.yaml
@@ -0,0 +1,86 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/virtio/iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: virtio-iommu device using the virtio-pci transport
+
+maintainers:
+  - Jean-Philippe Brucker 
+
+description: |
+  When virtio-iommu uses the PCI transport, its programming interface is
+  discovered dynamically by the PCI probing infrastructure. However the
+  device tree statically describes the relation between IOMMU and DMA
+  masters. Therefore, the PCI root complex that hosts the virtio-iommu
+  contains a child node representing the IOMMU device explicitly.
+
+  DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+properties:
+  # If compatible is present, it should contain the vendor and device ID
+  # according to the PCI Bus Binding specification. Since PCI provides
+  # built-in identification methods, compatible is not actually required.
+  compatible:
+oneOf:
+  - items:
+  - const: virtio,pci-iommu
+  - const: pci1af4,1057
+  - items:
+  - const: pci1af4,1057
+
+  reg:
+description: |
+  PCI address of the IOMMU. As defined in the PCI Bus Binding
+  reference, the reg property is a five-cell address encoded as (phys.hi
+  phys.mid phys.lo size.hi size.lo). phys.hi should contain the device's
+  BDF as 0b  dfff . 

Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION

2022-09-16 Thread Brian Foster
On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote:
> From: Sarthak Kukreti 
> 
> FALLOC_FL_PROVISION is a new fallocate() allocation mode that
> sends a hint to (supported) thinly provisioned block devices to
> allocate space for the given range of sectors via REQ_OP_PROVISION.
> 
> Signed-off-by: Sarthak Kukreti 
> ---
>  block/fops.c| 7 ++-
>  include/linux/falloc.h  | 3 ++-
>  include/uapi/linux/falloc.h | 8 
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index b90742595317..a436a7596508 100644
> --- a/block/fops.c
> +++ b/block/fops.c
...
> @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int 
> mode, loff_t start,
>   error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>len >> SECTOR_SHIFT, GFP_KERNEL);
>   break;
> + case FALLOC_FL_PROVISION:
> + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> +len >> SECTOR_SHIFT, GFP_KERNEL);
> + break;
>   default:
>   error = -EOPNOTSUPP;
>   }

Hi Sarthak,

Neat mechanism.. I played with something very similar in the past (that
was much more crudely hacked up to target dm-thin) to allow filesystems
to request a thinly provisioned device to allocate blocks and try to do
a better job of avoiding inactivation when overprovisioned.

One thing I'm a little curious about here.. what's the need for a new
fallocate mode? On a cursory glance, the provision mode looks fairly
analogous to normal (mode == 0) allocation mode with the exception of
sending the request down to the bdev. blkdev_fallocate() already maps
some of the logical falloc modes (i.e. punch hole, zero range) to
sending write sames or discards, etc., and it doesn't currently look
like it supports allocation mode, so could it not map such requests to
the underlying REQ_OP_PROVISION op?

I guess the difference would be at the filesystem level where we'd
probably need to rely on a mount option or some such to control whether
traditional fallocate issues provision ops (like you've implemented for
ext4) vs. the specific falloc command, but that seems fairly consistent
with historical punch hole/discard behavior too. Hm? You might want to
cc linux-fsdevel in future posts in any event to get some more feedback
on how other filesystems might want to interact with such a thing.

BTW another thing that might be useful wrt to dm-thin is to support
FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only
checks that blocks are allocated, but not whether those blocks are
shared (re: lookup_result.shared). It might be useful to do the COW in
such cases if the caller passes down a REQ_UNSHARE or some such flag.

Brian

> diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> index f3f0b97b1675..a0e506255b20 100644
> --- a/include/linux/falloc.h
> +++ b/include/linux/falloc.h
> @@ -30,7 +30,8 @@ struct space_resv {
>FALLOC_FL_COLLAPSE_RANGE | \
>FALLOC_FL_ZERO_RANGE | \
>FALLOC_FL_INSERT_RANGE |   \
> -  FALLOC_FL_UNSHARE_RANGE)
> +  FALLOC_FL_UNSHARE_RANGE |  
> \
> +  FALLOC_FL_PROVISION)
>  
>  /* on ia32 l_start is on a 32-bit boundary */
>  #if defined(CONFIG_X86_64)
> diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> index 51398fa57f6c..2d323d113eed 100644
> --- a/include/uapi/linux/falloc.h
> +++ b/include/uapi/linux/falloc.h
> @@ -77,4 +77,12 @@
>   */
>  #define FALLOC_FL_UNSHARE_RANGE  0x40
>  
> +/*
> + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to 
> allocate
> + * blocks for the range/EOF.
> + *
> + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate.
> + */
> +#define FALLOC_FL_PROVISION  0x80
> +
>  #endif /* _UAPI_FALLOC_H_ */
> -- 
> 2.31.0
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] jailhouse: Hold reference returned from of_find_xxx API

2022-09-16 Thread Juergen Gross via Virtualization

On 16.09.22 11:00, Liang He wrote:

In jailhouse_paravirt(), we should hold the reference returned from
of_find_compatible_node() which has increased the refcount and then
call of_node_put() with it when done.

Fixes: 63338a38db95 ("jailhouse: Provide detection for non-x86 systems")
Signed-off-by: Liang He 
Co-developed-by: Kelin Wang 
Signed-off-by: Kelin Wang 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Re: Re: [PATCH] jailhouse: Hold reference returned from of_find_xxx API

2022-09-16 Thread Andy Shevchenko
On Fri, Sep 16, 2022 at 10:09 AM Liang He  wrote:
> At 2022-09-16 13:38:39, "Andy Shevchenko"  wrote:
> >On Fri, Sep 16, 2022 at 5:02 AM Liang He  wrote:
> >> At 2022-09-16 07:29:06, "Srivatsa S. Bhat"  wrote:
> >> >On 9/14/22 7:23 PM, Liang He wrote:

...

> >> >>  static inline bool jailhouse_paravirt(void)
> >> >>  {
> >> >> -return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
> >> >> +struct device_node *np = of_find_compatible_node(NULL, NULL, 
> >> >> "jailhouse,cell");
> >> >> +
> >> >> +of_node_put(np);
> >> >> +
> >> >> +return np;
> >> >>  }
> >> >
> >> >Thank you for the fix, but returning a pointer from a function with a
> >> >bool return type looks odd. Can we also fix that up please?
> >>
> >> Thanks for your review, how about following patch:
> >>
> >> -   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
> >> +   struct device_node *np = of_find_compatible_node(NULL, NULL, 
> >> "jailhouse,cell");
> >> +
> >> +   of_node_put(np);
> >> +
> >> +   return (np==NULL);
>
> >This will be opposite to the above. Perhaps you wanted
>
> Sorry, I wanted to use 'np!=NULL'
>
> >  return  !!np;
> >
> >Also possible (but why?)
> >
> >  return np ? true : false;
>
> So, can I chose 'return np?true: false;' as the final patch?

Of course you can, it's up to the maintainer(s) what to accept.

-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage

2022-09-16 Thread Stefan Hajnoczi
On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote:
> From: Sarthak Kukreti 
> 
> Hi,
> 
> This patch series is an RFC of a mechanism to pass through provision requests 
> on stacked thinly provisioned storage devices/filesystems.
> 
> The linux kernel provides several mechanisms to set up thinly provisioned 
> block storage abstractions (eg. dm-thin, loop devices over sparse files), 
> either directly as block devices or backing storage for filesystems. 
> Currently, short of writing data to either the device or filesystem, there is 
> no way for users to pre-allocate space for use in such storage setups. 
> Consider the following use-cases:
> 
> 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that 
> the underlying thinpool metadata is not modified during the suspend 
> mechanism, the dm-thin device needs to be fully provisioned.
> 2) If a filesystem uses a loop device over a sparse file, fallocate() on the 
> filesystem will allocate blocks for files but the underlying sparse file will 
> remain intact.
> 3) Another example is virtual machine using a sparse file/dm-thin as a 
> storage device; by default, allocations within the VM boundaries will not 
> affect the host.
> 4) Several storage standards support mechanisms for thin provisioning on real 
> hardware devices. For example:
>   a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: 
> "When the THINP bit in the NSFEAT field of the Identify Namespace data 
> structure is set to ‘1’, the controller ... shall track the number of 
> allocated blocks in the Namespace Utilization field"
>   b. The SCSi Block Commands reference - 4 section references "Thin 
> provisioned logical units",
>   c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".

When REQ_OP_PROVISION is sent on an already-allocated range of blocks,
are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this
way, for example. That behavior is counterintuitive since the operation
name suggests it just affects the logical block's provisioning state,
not the contents of the blocks.

> In all of the above situations, currently the only way for pre-allocating 
> space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does 
> not scale well with larger pre-allocation sizes. 

What exactly is the issue with WRITE_ZEROES scalability? Are you
referring to cases where the device doesn't support an efficient
WRITE_ZEROES command and actually writes blocks filled with zeroes
instead of updating internal allocation metadata cheaply?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization