Re: [PATCH v2] jailhouse: Hold reference returned from of_find_xxx API
[ Adding author and reviewers of commit 63338a38db95 again ] On 9/16/22 2:00 AM, Liang He wrote: > In jailhouse_paravirt(), we should hold the reference returned from > of_find_compatible_node() which has increased the refcount and then > call of_node_put() with it when done. > > Fixes: 63338a38db95 ("jailhouse: Provide detection for non-x86 systems") > Signed-off-by: Liang He > Co-developed-by: Kelin Wang > Signed-off-by: Kelin Wang Reviewed-by: Srivatsa S. Bhat (VMware) > --- > > v2: use proper return type not the 'np' pointer > > include/linux/hypervisor.h | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h > index 9efbc54e35e5..f11eec57ea63 100644 > --- a/include/linux/hypervisor.h > +++ b/include/linux/hypervisor.h > @@ -27,7 +27,11 @@ static inline void hypervisor_pin_vcpu(int cpu) > > static inline bool jailhouse_paravirt(void) > { > - return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); > + struct device_node *np = of_find_compatible_node(NULL, NULL, > "jailhouse,cell"); > + > + of_node_put(np); > + > + return np ? true : false; > } > > #endif /* !CONFIG_X86 */ > Regards, Srivatsa VMware Photon OS ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
On 9/16/22 11:48, Sarthak Kukreti wrote: Yes. On ChromiumOS, we regularly deal with storage devices that don't support WRITE_ZEROES or that need to have it disabled, via a quirk, due to a bug in the vendor's implementation. Using WRITE_ZEROES for allocation makes the allocation path quite slow for such devices (not to mention the effect on storage lifetime), so having a separate provisioning construct is very appealing. Even for devices that do support an efficient WRITE_ZEROES implementation but don't support logical provisioning per-se, I suppose that the allocation path might be a bit faster (the device driver's request queue would report 'max_provision_sectors'=0 and the request would be short circuited there) although I haven't benchmarked the difference. Some background information about why ChromiumOS uses thin provisioning instead of a single filesystem across the entire storage device would be welcome. Although UFS devices support thin provisioning I am not aware of any use cases in Android that would benefit from UFS thin provisioning support. Thanks, Bart. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] dt-bindings: virtio: Convert virtio,pci-iommu to DT schema
Convert the binding that describes the virtio-pci based IOMMU to DT schema. Change the compatible string to "pci,", which is defined by the PCI Bus Binding, but keep "virtio,pci-iommu" as an option for backward compatibility. Signed-off-by: Jean-Philippe Brucker --- .../devicetree/bindings/virtio/iommu.txt | 66 -- .../devicetree/bindings/virtio/iommu.yaml | 86 +++ 2 files changed, 86 insertions(+), 66 deletions(-) delete mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt create mode 100644 Documentation/devicetree/bindings/virtio/iommu.yaml diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt b/Documentation/devicetree/bindings/virtio/iommu.txt deleted file mode 100644 index 2407fea0651c.. --- a/Documentation/devicetree/bindings/virtio/iommu.txt +++ /dev/null @@ -1,66 +0,0 @@ -* virtio IOMMU PCI device - -When virtio-iommu uses the PCI transport, its programming interface is -discovered dynamically by the PCI probing infrastructure. However the -device tree statically describes the relation between IOMMU and DMA -masters. Therefore, the PCI root complex that hosts the virtio-iommu -contains a child node representing the IOMMU device explicitly. - -Required properties: - -- compatible: Should be "virtio,pci-iommu" -- reg: PCI address of the IOMMU. As defined in the PCI Bus - Binding reference [1], the reg property is a five-cell - address encoded as (phys.hi phys.mid phys.lo size.hi - size.lo). phys.hi should contain the device's BDF as - 0b dfff . The other cells - should be zero. -- #iommu-cells:Each platform DMA master managed by the IOMMU is assigned - an endpoint ID, described by the "iommus" property [2]. - For virtio-iommu, #iommu-cells must be 1. - -Notes: - -- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the - virtio-iommu node doesn't have an "iommus" property, and is omitted from - the iommu-map property of the root complex. - -Example: - -pcie@1000 { - compatible = "pci-host-ecam-generic"; - ... - - /* The IOMMU programming interface uses slot 00:01.0 */ - iommu0: iommu@0008 { - compatible = "virtio,pci-iommu"; - reg = <0x0800 0 0 0 0>; - #iommu-cells = <1>; - }; - - /* -* The IOMMU manages all functions in this PCI domain except -* itself. Omit BDF 00:01.0. -*/ - iommu-map = <0x0 0x0 0x8> - <0x9 0x9 0xfff7>; -}; - -pcie@2000 { - compatible = "pci-host-ecam-generic"; - ... - /* -* The IOMMU also manages all functions from this domain, -* with endpoint IDs 0x1 - 0x1 -*/ - iommu-map = <0x0 0x1 0x1>; -}; - -ethernet@fe001000 { - ... - /* The IOMMU manages this platform device with endpoint ID 0x2 */ - iommus = < 0x2>; -}; - -[1] Documentation/devicetree/bindings/pci/pci.txt -[2] Documentation/devicetree/bindings/iommu/iommu.txt diff --git a/Documentation/devicetree/bindings/virtio/iommu.yaml b/Documentation/devicetree/bindings/virtio/iommu.yaml new file mode 100644 index ..d5bbb8ab9603 --- /dev/null +++ b/Documentation/devicetree/bindings/virtio/iommu.yaml @@ -0,0 +1,86 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/virtio/iommu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: virtio-iommu device using the virtio-pci transport + +maintainers: + - Jean-Philippe Brucker + +description: | + When virtio-iommu uses the PCI transport, its programming interface is + discovered dynamically by the PCI probing infrastructure. However the + device tree statically describes the relation between IOMMU and DMA + masters. Therefore, the PCI root complex that hosts the virtio-iommu + contains a child node representing the IOMMU device explicitly. + + DMA from the IOMMU device isn't managed by another IOMMU. Therefore the + virtio-iommu node doesn't have an "iommus" property, and is omitted from + the iommu-map property of the root complex. + +properties: + # If compatible is present, it should contain the vendor and device ID + # according to the PCI Bus Binding specification. Since PCI provides + # built-in identification methods, compatible is not actually required. + compatible: +oneOf: + - items: + - const: virtio,pci-iommu + - const: pci1af4,1057 + - items: + - const: pci1af4,1057 + + reg: +description: | + PCI address of the IOMMU. As defined in the PCI Bus Binding + reference, the reg property is a five-cell address encoded as (phys.hi + phys.mid phys.lo size.hi size.lo). phys.hi should contain the device's + BDF as 0b dfff .
Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION
On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > sends a hint to (supported) thinly provisioned block devices to > allocate space for the given range of sectors via REQ_OP_PROVISION. > > Signed-off-by: Sarthak Kukreti > --- > block/fops.c| 7 ++- > include/linux/falloc.h | 3 ++- > include/uapi/linux/falloc.h | 8 > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/block/fops.c b/block/fops.c > index b90742595317..a436a7596508 100644 > --- a/block/fops.c > +++ b/block/fops.c ... > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int > mode, loff_t start, > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, >len >> SECTOR_SHIFT, GFP_KERNEL); > break; > + case FALLOC_FL_PROVISION: > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT, > +len >> SECTOR_SHIFT, GFP_KERNEL); > + break; > default: > error = -EOPNOTSUPP; > } Hi Sarthak, Neat mechanism.. I played with something very similar in the past (that was much more crudely hacked up to target dm-thin) to allow filesystems to request a thinly provisioned device to allocate blocks and try to do a better job of avoiding inactivation when overprovisioned. One thing I'm a little curious about here.. what's the need for a new fallocate mode? On a cursory glance, the provision mode looks fairly analogous to normal (mode == 0) allocation mode with the exception of sending the request down to the bdev. blkdev_fallocate() already maps some of the logical falloc modes (i.e. punch hole, zero range) to sending write sames or discards, etc., and it doesn't currently look like it supports allocation mode, so could it not map such requests to the underlying REQ_OP_PROVISION op? I guess the difference would be at the filesystem level where we'd probably need to rely on a mount option or some such to control whether traditional fallocate issues provision ops (like you've implemented for ext4) vs. the specific falloc command, but that seems fairly consistent with historical punch hole/discard behavior too. Hm? You might want to cc linux-fsdevel in future posts in any event to get some more feedback on how other filesystems might want to interact with such a thing. BTW another thing that might be useful wrt to dm-thin is to support FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only checks that blocks are allocated, but not whether those blocks are shared (re: lookup_result.shared). It might be useful to do the COW in such cases if the caller passes down a REQ_UNSHARE or some such flag. Brian > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > index f3f0b97b1675..a0e506255b20 100644 > --- a/include/linux/falloc.h > +++ b/include/linux/falloc.h > @@ -30,7 +30,8 @@ struct space_resv { >FALLOC_FL_COLLAPSE_RANGE | \ >FALLOC_FL_ZERO_RANGE | \ >FALLOC_FL_INSERT_RANGE | \ > - FALLOC_FL_UNSHARE_RANGE) > + FALLOC_FL_UNSHARE_RANGE | > \ > + FALLOC_FL_PROVISION) > > /* on ia32 l_start is on a 32-bit boundary */ > #if defined(CONFIG_X86_64) > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h > index 51398fa57f6c..2d323d113eed 100644 > --- a/include/uapi/linux/falloc.h > +++ b/include/uapi/linux/falloc.h > @@ -77,4 +77,12 @@ > */ > #define FALLOC_FL_UNSHARE_RANGE 0x40 > > +/* > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to > allocate > + * blocks for the range/EOF. > + * > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate. > + */ > +#define FALLOC_FL_PROVISION 0x80 > + > #endif /* _UAPI_FALLOC_H_ */ > -- > 2.31.0 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] jailhouse: Hold reference returned from of_find_xxx API
On 16.09.22 11:00, Liang He wrote: In jailhouse_paravirt(), we should hold the reference returned from of_find_compatible_node() which has increased the refcount and then call of_node_put() with it when done. Fixes: 63338a38db95 ("jailhouse: Provide detection for non-x86 systems") Signed-off-by: Liang He Co-developed-by: Kelin Wang Signed-off-by: Kelin Wang Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: Re: [PATCH] jailhouse: Hold reference returned from of_find_xxx API
On Fri, Sep 16, 2022 at 10:09 AM Liang He wrote: > At 2022-09-16 13:38:39, "Andy Shevchenko" wrote: > >On Fri, Sep 16, 2022 at 5:02 AM Liang He wrote: > >> At 2022-09-16 07:29:06, "Srivatsa S. Bhat" wrote: > >> >On 9/14/22 7:23 PM, Liang He wrote: ... > >> >> static inline bool jailhouse_paravirt(void) > >> >> { > >> >> -return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); > >> >> +struct device_node *np = of_find_compatible_node(NULL, NULL, > >> >> "jailhouse,cell"); > >> >> + > >> >> +of_node_put(np); > >> >> + > >> >> +return np; > >> >> } > >> > > >> >Thank you for the fix, but returning a pointer from a function with a > >> >bool return type looks odd. Can we also fix that up please? > >> > >> Thanks for your review, how about following patch: > >> > >> - return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); > >> + struct device_node *np = of_find_compatible_node(NULL, NULL, > >> "jailhouse,cell"); > >> + > >> + of_node_put(np); > >> + > >> + return (np==NULL); > > >This will be opposite to the above. Perhaps you wanted > > Sorry, I wanted to use 'np!=NULL' > > > return !!np; > > > >Also possible (but why?) > > > > return np ? true : false; > > So, can I chose 'return np?true: false;' as the final patch? Of course you can, it's up to the maintainer(s) what to accept. -- With Best Regards, Andy Shevchenko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti > > Hi, > > This patch series is an RFC of a mechanism to pass through provision requests > on stacked thinly provisioned storage devices/filesystems. > > The linux kernel provides several mechanisms to set up thinly provisioned > block storage abstractions (eg. dm-thin, loop devices over sparse files), > either directly as block devices or backing storage for filesystems. > Currently, short of writing data to either the device or filesystem, there is > no way for users to pre-allocate space for use in such storage setups. > Consider the following use-cases: > > 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that > the underlying thinpool metadata is not modified during the suspend > mechanism, the dm-thin device needs to be fully provisioned. > 2) If a filesystem uses a loop device over a sparse file, fallocate() on the > filesystem will allocate blocks for files but the underlying sparse file will > remain intact. > 3) Another example is virtual machine using a sparse file/dm-thin as a > storage device; by default, allocations within the VM boundaries will not > affect the host. > 4) Several storage standards support mechanisms for thin provisioning on real > hardware devices. For example: > a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: > "When the THINP bit in the NSFEAT field of the Identify Namespace data > structure is set to ‘1’, the controller ... shall track the number of > allocated blocks in the Namespace Utilization field" > b. The SCSi Block Commands reference - 4 section references "Thin > provisioned logical units", > c. UFS 3.0 spec section 13.3.3 references "Thin provisioning". When REQ_OP_PROVISION is sent on an already-allocated range of blocks, are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this way, for example. That behavior is counterintuitive since the operation name suggests it just affects the logical block's provisioning state, not the contents of the blocks. > In all of the above situations, currently the only way for pre-allocating > space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does > not scale well with larger pre-allocation sizes. What exactly is the issue with WRITE_ZEROES scalability? Are you referring to cases where the device doesn't support an efficient WRITE_ZEROES command and actually writes blocks filled with zeroes instead of updating internal allocation metadata cheaply? Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization